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Executive summary 


The National Board for Professional Teaching Standards (NBPTS) is 
a professional organization that provides national certification to 
teachers who apply for and meet the Board’s standards of perfor- 
mance for “accomplished” educators. The certification process is vol- 
untary, and it is a time-consuming and rigorous one, requiring 
applicants to furnish a portfolio containing videos of their instruc- 
tion, copies of their students’ work, and written reflections on their 
instruction, as well as to complete online exercises that assess their 
pedagogical and subject-matter knowledge. 


The National Board certification (NBC) process is a research-based 
program that was developed over a 10-year period with financing 
from the National Science Foundation, the U.S. Department of Edu- 
cation, and private funders. Only experienced, certified educators 
with at least a bachelor’s degree are eligible to apply. The certifica- 
tion process can take a few months to two years. Teachers who are 
unsuccessful may refine and resubmit portions of their application 
and/or retake the exercises to raise their score and achieve certifica- 
tion on a second or third attempt. 


Because of the significant resources involved, both in the develop- 
ment of the standards and in the application process, there has been 
a good deal of attention focused on NBC’s value and effectiveness. 


In 2008, the National Academy of Sciences’ National Research Coun- 
cil released a report reviewing the evidence on the NBC’s effective- 
ness. The Council concluded that, “The evidence is clear National 
Board certification distinguishes more effective teachers from less ef- 
fective teachers with respect to student achievement” (2008, p. 179). 
But the extant literature left understudied, and unresolved, whether 
the certification process itself also improves teachers’ effectiveness by 
augmenting human capital—the intrinsic capability of a teacher to 
teach effectively, which may be increased through experience, educa- 
tion, training, and professional development. 


The Council also noted that the large-scale statistical studies pertain- 
ing to National Board certification focused almost exclusively on 
teachers in Florida and North Carolina, and on the elementary 
grades. Furthermore, virtually all of the analyses focused only on the 
test scores of students in mathematics or reading. 


Study goals 


This study responds to a request from the NBPTS to analyze National 
Board certification among high school teachers in understudied sub- 
ject areas and locales to help fill gaps in the research literature. We 
also were asked to use multiple indicators of performance. 


Approach 


The research team selected two new locales for this analysis, the 
Commonwealth of Kentucky and the Chicago public schools. Chica- 
go, a racially and ethnically diverse city with a population of more 
than 2.8 million, has one of the largest urban school districts in the 
country. Kentucky, by contrast, is a largely rural state with some sub- 
urban and urban areas, including the Louisville/Jefferson County 
metro area, population 750,000. Together, these two locales encom- 
pass a full range of public school settings. 


The proliferation of longitudinal data systems that allow researchers 
to link students to their subject-area teachers and to track student 
performance over time provides new opportunities to examine 
NBPTS processes in these new locations. In addition, both school sys- 
tems use ACT’s Educational Planning and Assessment System (EPAS) 
to monitor the academic progress of their high school students. EPAS 
comprises three assessments: the EXPLORE’, given in grade 8 or 9; 
the PLAN®, given in grade 10; and the ACT®, given in grade 11 or 12. 
Each assessment includes subtests in English, mathematics, science, 
reading, and writing. In this study we use test scores in the first three 
subject areas to examine outcomes for high school students whose 
teachers participated in the NBC process and for high school stu- 
dents whose teachers did not participate. 


In addition to examining student test scores, we also conducted class- 
room observations of the instructional practices of high school 
teachers in science and mathematics, comparing a sample of NBC 
applicants and similar teachers not pursuing this certification. We 


conducted these observations at baseline—that is, in the semester 
when the NBC applicants first submitted their applications for certifi- 
cation—and then again in each of the next two semesters. Most of 
the comparison teachers came from the same schools as the NBC ap- 
plicants and were observed on the same days. 


We used the Leadership by Design (LBD) classroom observation in- 
strument to assess instruction. Teachers were rated on nine different 
dimensions of instruction: /esson overview, instructional overview, 
questioning, Classroom atmosphere, concept development, teacher's 
content knowledge, learning climate, classroom management, and 
assessments. Teachers also were given an overall instructional-quality 
rating by the site observers. 


Research questions 


In order to get a thorough understanding of the effects of National 
Board certification, we addressed the following four questions: 


1. Does the National Board certification process influence 
teachers’ classroom practices? 


As measured by student test scores: 


2. Are National Board—certified teachers more effective than 
other teachers? 


3. Are applicants who attain National Board certification more 
effective than applicants who do not? 


4. What effect, if any, does the National Board certification pro- 
cess have on teacher effectiveness? 


Question | is addressed by examining instructional practices over 
time for NBC applicants compared with non-applicants. To address 
questions 2—4, we compare outcomes for students taught by National 
Board-certified teachers with those taught by non-certified teachers, 
developing three different modeling frameworks that measure, re- 
spectively, the efficacy of National Board certification in “signaling” 
teacher effectiveness, “screening” for teacher effectiveness, and “hu- 
man capital” formation that increases teacher effectiveness. 


Findings 


Ratings of the instructional practices of NBC applicants exceeded 
those of non-applicants at baseline on six of the nine teaching-quality 
subscales, as well as on the overall rating of instructional quality. 
However, there was little evidence of growth in instructional quality 
over the observation period for either applicants or non-applicants. 


Our analyses of student test scores considered five different model 
specifications, and student achievement gains were estimated for 
PLAN and ACT scores in English, mathematics, and science. The 
baseline model controls for a rich set of student characteristics, in- 
cluding prior test score. Subsequent models add school characteris- 
tics and the average pretest score of all students assigned to a given 
teacher. These models help to correct for the nonrandom assignment 
of students to schools and to teachers that may affect measurements 
of teacher effectiveness. A final model replaces school characteristics 
with school fixed effects, providing comparisons of teacher effective- 
ness within schools. 


We found evidence that Board certification is an effective “signal” of 
teacher quality. Although effect sizes varied, these results generally 
held across locales, test types, and subject areas. The estimated effect 
sizes are similar to those found elsewhere in the literature, and are 
smallest when National Board—certified teachers (NBCTs) were com- 
pared with other teachers in the same schools. 


The “screening” models compared student outcomes based on the 
amount of instruction students had from teachers who ever earned or 
would later earn Board certification during the study period and the 
amount of instruction students had from teachers who applied for 
National Board certification but were never certified. These models 
found some evidence that NBC effectively screens applicants. Results 
were strongest for mathematics, and weakest for English, and gener- 
ally did not reveal differences for within-school comparisons. 


We were unable to find evidence of a “human capital” effect indicat- 
ing that teacher effectiveness increased over time, based on student 
test scores for teachers in our sample, including those who advanced 
through the NBC process from pre-applicant to applicant or from 
applicant to post-applicant. 


Conclusions and recommendations 


Using data for high school teachers and their students from Chicago 
and Kentucky public schools, we found evidence that National Board 
certification is an effective signal of teacher quality, based on student 
test scores. We also found some evidence that the certification pro- 
cess successfully screens applicants based on their effectiveness. But 
we were unable to find evidence that the certification process itself 
enhances the instructional quality or effectiveness of teachers who 
choose to go through it. 


Our analysis of the professional development value of the National 
Board certification process as measured by changes in instructional 
practices was limited by the length of time over which we were able to 
observe teachers’ practices for changes, as well as by the inability to 
identify and observe teachers prior to their joining the applicant 
pool. It is quite likely that new applicants have already spent time pri- 
or to formally applying for Board certification reflecting on their 
practices, and possibly taking steps to improve those practices. In- 
deed, programs such as NBPTS’s own Take One! are designed to 
help teachers prepare for the application process before formally ap- 
plying. Our inability to observe teachers before they formally file 
their application may cause our estimates to understate the true im- 
pact of NBC on teaching practices. 


The analysis of improvements in teachers’ effectiveness as measured 
by their students’ test scores also was limited by the four-year period 
of the provided data, which dictated the number of teachers we were 
able to observe in each stage of the certification process. 


It is important to keep in mind that our findings about the human 
capital effects only pertain to the experienced teachers eligible to ap- 
ply for National Board certification. The results shed no light on the 
potential of the certification process to improve the instructional 
practices of less-experienced teachers (i.e., with fewer than three 
years of teaching) who are not eligible, or of less-able teachers who 
do not apply for certification. 


Nor does our analysis examine the role that the certification process 
might play in helping to identify specific areas of improvement for 
teachers who go through the process, or identify which elements of 


the applicant portfolio are most closely linked to teacher effective- 
ness, as measured by student test scores. 


Given that the National Board certification process has repeatedly 
demonstrated the ability to distinguish between more and less effec- 
tive teachers, school systems should think about how to make good 
use of this tool. For example, school systems could use National 
Board certification as a gatekeeper for advancement or as part of the 
tenure decision process, where tenure decisions are implemented at 
a later point in the teaching career path than the criteria most school 
systems currently use for those decisions. 


Introduction 


One of the most important issues facing education policy-makers is 
how to prepare students to be productive citizens in an increasingly 
competitive global economy. Evidence from state and national as- 
sessments provides a mixed picture as to whether states are success- 
fully doing so. While state accountability systems suggest that the 
proportion of students meeting state benchmarks is rising, perfor- 
mance on the National Assessment of Educational Progress (NAEP) 
has been relatively stagnant, especially in mathematics and among 17- 
year-olds (Rampey, Dion, & Donahue, 2009) vr 


The teacher quality literature suggests that teachers are the single 
most important school-based input into student learning, and that 
teacher quality (as measured by a teacher’s contribution to student 
achievement on standardized tests) varies considerably across schools 
and also within a single school (Aaronson, Barrow, & Sander, 2007; 
Goldhaber, 2002; Rivkin, Hanushek, & Kain, 2005; Rockoff, 2004). 
These measures of teacher quality are, however, largely unrelated to 
any of the teacher characteristics generally available, such as highest 
level of education (Clotfelter, Ladd, & Vigdor, 2007; Goldhaber, 
2007); years of teaching experience beyond the first two or three 
(Clotfelter et al., 2007; Goldhaber, 2002; Rivkin et al., 2005); or indi- 
cators of ability such as selectivity of undergraduate institution or test 
scores (Goldhaber, 2002; 2007; Harris & Sass, 2007; Kane, Rockoff, & 
Staiger, 2008). So teachers are important to the learning process, but 
it is difficult to pinpoint specific measures that identify high-quality 
teachers. 


Improving teacher quality has been central to significant national 
education initiatives in the Bush and Obama administrations. No 
Child Left Behind (NCLB) is national legislation passed in 2001 that 


1. The NAEP is the only nationally representative assessment of student 
achievement in the United States. It is funded by the U.S. Department of 
Education. Samples of 4th, 8th, and 12th grade students take the NAEP eve- 
ry other year. 


increased emphasis on state accountability systems. One of NCLB’s 
major mandates was that all students should be taught by “high- 
quality” teachers. The definition of high quality was that all teachers 
must be fully certified, have at least a bachelor’s degree, and demon- 
strate content area knowledge—although research (cited above) pub- 
lished since the passage of NCLB indicates that these particular 
indicators are not necessarily markers of high-quality teachers. 


In 2009-10, the U.S. Department of Education initiated a grant com- 
petition called Race to the Top, in which states compete for federal 
education funding. To be competitive for these grants, states have to 
show commitment to improving the quality of teaching by designing 
and implementing better teacher evaluation systems, increasing equi- 
table access of students to good teachers and good principals, and 
improving the state of teacher preparation programs and teacher 
support. The component of teacher and principal quality gets the 
most weight in the competition. 


One way teachers can demonstrate their skill level and successes in 
the classroom is by earning certification from the National Board for 
Professional Teaching Standards (NBPTS). National Board was estab- 
lished to help professionalize the field of teaching by providing an 
accepted definition of what “accomplished” teaching is and recogniz- 
ing teachers who do their jobs exceptionally well. An original goal of 
National Board certification (NBC) was to build an authentic assess- 
ment system that could reliably measure what experienced teachers 
should know and be able to do (Carnegie Task Force on Teaching as 
a Profession, 1986). Educators would volunteer to participate in the 
program and those who successfully demonstrated the appropriate 
level of professionalism and expertise would be awarded a nationally 
recognized certificate attesting to that level of demonstrated perfor- 
mance. 


Since being established in 1987, NBPTS has certified more than 
100,000 teachers, and countless more have participated in the appli- 
cation process (NBPTS, 2013). Large investments have been made in 
the development of the National Board certification program. As of 
September 2005, the National Science Foundation and the U.S. De- 
partment of Education had appropriated more than $149 million 
dollars to it, and nongovernment funders had spent an additional 
$261 million (Cohen & Rice, 2005). Applicants for certification (or 


more typically, their sponsoring school systems) also incur substantial 
costs. As a result, there is a great deal of interest in identifying and 
measuring the full value to education systems of encouraging teach- 
ers to become National Board certified. 


This study uses a two-pronged approach to examine the effectiveness 
of National Board-—certified teachers and NBC applicants. As de- 
scribed in the first part of this report, we use classroom observations 
of teachers in the state of Kentucky and in Chicago Public Schools 
(CPS) to examine the quality of instructional practices of National 
Board applicants and non-applicants and whether teachers’ instruc- 
tional practices change over time. We observe outcomes for National 
Board certification applicants at the beginning, middle, and end of 
the process, and compare the results with a control group of non- 
applicants. 


As described in the second part of this report, we analyze administra- 
tive data for teachers and students, again from Kentucky and Chicago 
Public Schools, matching students to their demographic characteris- 
tics, multiple years of standardized test scores, and teachers. This al- 
lows us to examine signaling and screening effects of National Board 
certification, as well as human capital formation—that is, any profes- 
sional development benefits of the NBC process, as measured by im- 
provement in test scores of the students of National Board-certified 
teachers. 


Through this analysis we want to better understand how the National 
Board certification process relates to teaching effectiveness and to 
changes in teaching practice, and thus to improvements in student 
learning. Specifically we seek to answer these questions: 


1. Does the National Board certification process influence 
teachers’ classroom practices? 


As measured by student test scores: 


2. Are National Board—certified teachers more effective than 
other teachers? 


3. Are applicants who attain National Board certification more 
effective than applicants who do not? 
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4. What effect, if any, does the National Board certification pro- 
cess have on teacher effectiveness? 


This report begins by describing the role of National Board in 
improving student learning and by reviewing the relevant litera- 
ture. Second, we describe the setting, the data sources, and the 
characteristics of the schools, teachers, and students in our sam- 
ple. Third, we describe our methods and findings from the class- 
room observations; we also describe the methods and findings 
from our analyses of student test scores. We conclude by summa- 
rizing the key findings, the limitations of this study, and the impli- 
cations both for future research and for practice. 


The role of National Board in improving 
student learning 


NBPTS developed a rigorous, multifaceted evaluation program for 
the purpose of identifying highly effective (“accomplished”) teachers. 
Applicants can select from among 25 certificate areas, which are 
based on the age of the students taught and the subject area of in- 
. : . : 2 
struction (not all subject areas are available in every age category). 


Table 1: National Board certification subject areas and we —— 


Art Early childhood (ages 3-8) 

Career and technical education Middle childhood (ages 7-12) 

English as a new language Early and middle childhood (ages 3-12) 

English language arts Early childhood through young adulthood (ages 3-18+) 
Exceptional needs specialist Early adolescence (ages 11-15) 

Generalist Adolescence and young adulthood (ages 14—18+) 

Health education Early adolescence through young adulthood (ages 11-18+) 


Library media 
Literacy 

Mathematics 

Music 

Physical education 
School counseling 
Science 

Social studies—History 
World language 


To apply, teachers must assemble and submit a portfolio of specific 
materials, including artifacts from their classroom instruction and 
student work, video of their classroom interactions with students, 
written reflections analyzing the instructional practice evident in the 
videos and student work, and a written statement that demonstrates 


2. For more information, see http://www.nbpts.org/certificate-areas. 
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their involvement in activities outside the classroom that benefit stu- 
dent learning. In addition, they must pass six in-depth computer- 
based “exercises,” essentially assessments of their content and peda- 
gogical knowledge in their specialty area (NBPTS, 2011). 


In all, the process can take many months to two years. Applicants 
submit their application forms, fees, and proof of eligibility and begin 
developing their portfolios between February and December of the 
first year. Eligible applicants then take the computer-based. assess- 
ments between March and June of the second year. At least one port- 
folio entry must be submitted by May of year two. Applicants have a 
maximum of two years to complete all the requirements, with the ca- 
veat that no portfolio entry can be more than 12 months old. Appli- 
cants do not find out their certification status until the following 
November—December. 


In an evolution to the original process, teachers who do not pass all 
sections of the certification may reapply and resubmit materials for 
the section(s) they did not pass previously. The reapplication cycle is 
1 year, as opposed to the initial 2-year application window. Once 
awarded, National Board certification is valid for 10 years, at which 
point teachers must reapply if they are interested in maintaining 
their certification status. 


The National Board certification process defines “accomplished” 
teaching based on five core propositions (NBPTS, 2002): 


e Proposition 1: Teachers are committed to students and their 
learning. 


e Proposition 2: Teachers know the subjects they teach and how 
to teach those subjects to students. 


e Proposition 3: Teachers are responsible for managing and 
monitoring student learning. 


e Proposition 4: Teachers think systematically about their prac- 
tice and learn from experience. 


e Proposition 5: Teachers are members of learning communities. 


NBPTS uses an “Architecture of Accomplished Teaching Helix” to il- 
lustrate what accomplished teaching looks like (see Figure 1). The 
process begins with the teacher understanding the needs of the stu- 


dents and setting appropriate goals for them. Then the teacher im- 
plements instruction based on those goals, evaluates learning related 
to the goals, and reflects on students’ learning. This is a continuous 
process, in that the teacher continually repeats it by setting new goals 
that are appropriate for students at the current time. 


Figure 1: National Board for Professional Teaching Standards “Architecture of Accomplished 
Teaching Helix.” 


Set new high and 
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SOURCE: NBPTS, 2012. 


Each NBC applicant is expected to demonstrate the five core propo- 
sitions in their video recording of a whole class discussion, commen- 
tary on the instruction evident in the video, and responses to written 
questions that guide the teacher to address the certification standards 
and the core propositions. The written commentary is expected to be 
analytic and reflective, demonstrating the teacher’s understanding of 
his or her own teaching practices and the students’ learning. 
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Teachers who decide to apply for National Board certification gener- 
ally have many support options available to them. Many teachers ask a 
colleague to help them reflect on their practices and build a strong 
portfolio. A preparatory professional development program offered 
by NBPTS called Take One! will provide teachers with information 
about the certification standards and allows them to submit a video 
portfolio entry for scoring prior to formally applying. Some districts 
and state departments of education, including the Kentucky Depart- 
ment of Education (KDE) and the Chicago Public Schools (CPS), 
have central office staff members dedicated to helping teachers be- 
come National Board certified. 


In Chicago, teachers have at least two options (one through the dis- 
trict and another through the teachers’ union) for ongoing candi- 
date support during the National Board application process. These 
programs provide weekly or biweekly meetings for candidate teachers 
to come together to review and revise their portfolios, as well as coun- 
seling on whether or not the time-consuming process is a good fit for 
them. In Kentucky, the Kentucky Education Association offers profes- 
sional learning opportunities for teachers interested in applying for 
certification or renewal. It also provides training for educators who 
are interested in serving as mentors to National Board candidates. 
Further, many postsecondary schools of education offer programs to 
help teachers prepare for the rigors of National Board certification. 


Putting all the pieces together, completing the NBC process requires 
a significant investment of time and effort. Because only teachers 
with at least three years of teaching experience are eligible to apply, 
National Board certification does not help principals make hiring de- 
cisions with less-experienced teachers. Yet, simply identifying high- 
quality teachers has no direct effect on the number of them in the 
profession. What impact, then, can National Board certification have 
on student learning? 


In this study, we investigate the main ways in which National Board 
can improve the quality of classroom teaching. The first has been the 
subject of much academic research—that being National Board certi- 
fied can serve as an indicator of teacher quality. This implies both 
that high-quality teachers apply for National Board certification (the 
signaling effect) and that the NBC process does a good job of screen- 
ing applicants and awarding certification to the most qualified (the 


screening effect). If certification is a good indicator of teacher quali- 
ty, then principals and district administrators can use National Board 
certification to inform their staffing and leadership decisions with 
experienced teachers. Namely, given a large enough supply of Na- 
tional Board-certified teachers, principals and school districts can 
improve average teacher quality by staffing a large number of teach- 
ing positions with National Board-certified teachers. 


A second way in which National Board certification might improve 
average teacher quality is by using the process as part of a framework 
for better managing the teacher workforce. If National Board certifi- 
cation were part of a deliberate system aimed at improving the overall 
quality of instruction, if it were used, for example, as part of a revised 
tenure, compensation, or advancement system, more able candidates 
might choose to enter, or stay, in teaching. 


A third way in which National Board certification might improve av- 
erage teacher quality is by changing and improving teachers’ practic- 
es. In other words, perhaps the NBC process itself contributes in 
terms of “human capital” by developing better teachers, regardless of 
the outcome of their applications. 


We discuss, in turn, each of these roles: the role of National Board 
certification as a signal to identify high-quality teachers; the ability of 
the NBC process to screen less-effective applicants from more- 
effective applicants; and the human capital role of the NBC process 
itself in improving instructional quality through teacher professional 
development. 


Identifying high-quality teachers 


The end goal of most education policy interventions is to improve 
student outcomes, and the main mechanism for increasing student 
learning is to ensure that students are exposed to high-quality teach- 
ing. One strategy is to replace underperforming teachers with higher- 
quality teachers. While this approach might at first glance seem sim- 
ple to implement, there are many complicating issues. First and 
foremost, researchers and policy-makers continue to grapple with 
how to measure teaching effectiveness, since the observable teacher 
characteristics available in most datasets have little correlation with 
measures of student learning. 
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As an alternative to using traditional teacher characteristics such as 
years of experience and highest level of education completed, Na- 
tional Board certification could be used by teachers to signal that 
they are high quality. If so, principals could use this information to al- 
locate resources and staff more effectively. Perhaps within a school, 
principals might give National Board teachers more desirable as- 
signments in order to keep them in a school; principals in other 
schools might try to single out National Board-certified teachers in 
the hiring process; and so on. In short, certified teachers might have 
more flexibility both in their current positions and in the larger 
teacher labor market. 


There is evidence that obtaining National Board certification has sig- 
naling value—that teachers with National Board certification are in- 
deed of higher quality than teachers who are not certified (Cantrell, 
Fullerton, Kane, & Staiger, 2008; Cavalluzzo, 2004; Clotfelter, Ladd, & 
Vigdor, 2007; Goldhaber & Anthony, 2007). Most studies that identify 
the signaling effect of National Board certification compare certified 
teachers (NBCTs) and noncertified teachers, making statistical ad- 
justments to account for the fact that teachers who participate in cer- 
tification might be different from those who do not. 


These effect sizes are generally statistically significant, though small. 
For example, McCaffrey and Rivkin (2007) found that compared with 
other, noncertified teachers in the state, North Carolina NBCTs 
raised 4th and 5th grade math scores on the state-mandated account- 
ability test by 7 to 8 percent of a standard deviation, and reading 
scores for the same grades by 4 to 5 percent of a standard deviation. 
They further found that in Florida, NBCTs raised 4th and 5th grade 
reading scores by 2 to 4 percent of a standard deviation compared 
with noncertified teachers; Florida NBCTs had no statistically signifi- 
cant effect, however, on 4th and 5th grade math scores. These results 
are broadly consistent with those of several other studies (Clotfelder 
et al., 2007; Goldhaber & Anthony, 2007; Harris & Sass, 2006; Sand- 
ers, Ashton, & Wright, 2005). All of these studies find modest effects 
in reading, but the results are more mixed in math. 


Research suggests, too, that the NBC process is a good screening 
mechanism for identifying high-quality teachers. The screening effect 
refers to the ability of the National Board certification process to dis- 
tinguish more-effective from less-effective teachers who apply for cer- 


tification. As such, National Board—certified teachers are more effec- 
tive than are applicants who complete the application process but do 
not achieve certification, as measured by student achievement (Caval- 
luzzo, 2004; Clotfelter et al., 2007; Goldhaber & Anthony, 2007; 
Sanders et al., 2005). In general, these studies find that students 
taught by National Board-certified teachers make statistically signifi- 
cantly larger test score gains than those taught by teachers who ap- 
plied but were not certified. Effect sizes tend to be larger for math 
than for reading (Hakel, Koenig, & Elliott, 2008). 


The literature cited here focuses almost exclusively on statistical 
comparisons in just two states, Florida and North Carolina, and on 
elementary school students. In this study, we expand on the existing 
literature—providing evidence from two additional locations, Ken- 
tucky and Chicago. We also focus exclusively on high school teachers. 


Human capital development 


In the context of education, “human capital” can be defined as the 
intrinsic capability of a teacher to engage in effective instruction. A 
teacher’s human capital stock can be increased through investment 
in education, training, and professional development activities (Eide 
and Showalter, 2010). As with any educational intervention, the quali- 
ty of professional development varies, from good to bad and every- 
thing in between. Research on professional development in Chicago 
Public Schools suggests that teachers benefit most from training that 
promotes ambitious, intellectually challenging instruction; occurs 
frequently and over time; exposes the teacher to content in his or her 
subject area; and features developments in pedagogical techniques 
(Smylie, Allensworth, Greenberg, Harris, & Luppescu, 2001). The 
U.S. Department of Education defines high-quality professional de- 
velopment as sustained and content focused, aligned with state learn- 
ing standards, and focused on developing understanding of 
“scientifically proven” instructional techniques (Yoon, Duncan, Lee, 
Scarloss, & Shapley, 2007). 


Overall, the literature shows little to no effect of most professional 
development programs on student outcomes (e.g., Harris & Sass, 
2007; Jacob & Lefgren, 2004; Podgursky, Springer, & Hutton, 2010). 
In particular, much of the funding for professional development is 
spent on “one-shot” workshops or other events not shown to translate 
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into improvements in student outcomes (Garet, Porter, Desimone, 
Birman, & Yoon, 2011). 


There is some research, however, identifying characteristics of high- 
quality professional development programs (e.g., Jacob & Lefgren, 
2004), and the National Board certification process appears to have 
many of these. The NBC application process itself is sustained over 
time, and the application materials include a portfolio of lessons, as- 
sessments, and reflections prepared by the teacher and based on the 
students in his or her actual classroom. Although the original motiva- 
tion for establishing NBPTS was not to build a strong professional de- 
velopment program, it is clear that its certification process has the 
markings of one. As a result, it is reasonable to expect that participa- 
tion in the NBC process could improve a teacher’s instruction, and 
that better instruction would translate into better student outcomes. 


Here, the question we are interested in answering has to do with the 
third way in which National Board certification can improve student 
learning—that is, does participation in the NBC process itself im- 
prove that teacher’s effectiveness, regardless of whether or not the 
applicant completes it and/or achieves certification? Is the NBC pro- 
cess effective professional development? 


The extant literature leaves understudied, and unresolved, whether 
National Board certification is more than a good signal of and screen 
for identifying high-quality teachers. Many studies that try to capture 
its human capital effects compare teachers who are at different stages 
in the certification process (before applying, applying, and after ap- 
plying). They typically find that teachers’ effectiveness declines mar- 
ginally while they are applying, which could be a result of their 
spending so much time and energy on their portfolio that it distracts 
from their teaching (Clotfelder et al., 2006; 2007; Goldhaber & An- 
thony, 2007; Harris & Sass, 2006; McCaffrey & Rivkin, 2007). These 
same studies produce mixed results about gains in teacher effective- 
ness after the application process ends. 


It is worth noting that there are limitations in the current research. 
Any observable gains in student learning might simply be due to cer- 
tified teachers being better able to signal and sort into schools or to 
getting different teaching assignments after being certified. Gains 


could just be a function of certified teachers now teaching higher 
achieving students or in higher achieving schools. 


We propose a different approach to estimating the human capital ef- 
fects: comparing individual teachers against themselves over time us- 
ing a teacher fixed effects model. Although this approach has had 
limited use in the research literature (e.g., Harris & Sass, 2006), it 
should result in more accurate estimates of the ability of the National 
Board certification process to increase teacher human capital. 


Changing classroom practices 


While the majority of research on the effects of National Board certi- 
fication has relied on administrative datasets (i.e., test scores), several 
studies have looked at the effect of the process on teachers’ class- 
room practices, including instruction and classroom management. 
Darling-Hammond, Atkin, Sato, Chung, Dean, and Greenwald 
(2007) used teacher-submitted lesson videotapes and student work 
samples, interviews, and surveys to assess the effects of the certifica- 
tion process on high school math and science teachers. This study 
randomly assigned teachers to two groups—one group who applied 
for National Board certification, and a second group who postponed 
their application until after the study. The study’s attrition rate was 
high: about 75 percent of the teachers in the initial sample dropped 
out, leaving a final sample of only 16 teachers. The study found some 
evidence that teachers who went through NBC improved their forma- 
tive assessment practices more than did nonparticipants: teachers 
who applied for certification were found to use a wider variety of as- 
sessment methods and better integrated assessment with instruction. 


Other studies have used survey evidence to assess the self-reported 
views of teachers who have gone through National Board certification 
(Indiana Professional Standards Board, 2002; Yankelovich Partners, 
2001). Typically, the surveys are conducted only after teachers com- 
plete the certification process, so there is no way to disentangle 
whether differences in practices were preexisting or due to participa- 
tion (Hakel et al., 2008). Nevertheless, teachers tend to report NBC 
helped them improve their teaching and increased their ability to re- 
flect on their teaching practices and incorporate the results of this re- 
flective activity into their instruction. 
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We will provide further evidence of the effect of the National Board 
certification process on classroom practices through a series of class- 
room observations. Our observations of National Board applicants 
are conducted at three points in time: once as teachers begin the cer- 
tification process for the first time, once in the middle of the process, 
and once at the end. Observations are also conducted at similar times 
for a set of control teachers not participating in certification. 


These observations provide additional support in testing whether the 
National Board certification process is an effective screening or sig- 
naling mechanism, and whether it is effective professional develop- 
ment. For example, the screening effect would be supported if 
National Board applicants start out with higher ratings on their class- 
room observations than do non-applicants. This would indicate that 
teachers who self-select into the certification process tend to be high- 
er-quality teachers to begin with. The human capital hypothesis 
would be supported if NBC applicants demonstrate greater gains in 
instructional quality over time than do non-applicants. This would 
suggest that participating teachers may be learning new information 
through certification that is improving their teaching. 


Description of the data 


The setting 


The data we analyzed for this study (both the classroom observations 
of teacher instruction and the student test scores) are from public 
school across the state of Kentucky and from the Chicago Public 
Schools district. Kentucky is an ideal state for this study. First, Nation- 
al Board enjoys strong support there. Through the efforts of teachers 
and the financial support of the Teachers’ National Certification In- 
centive Trust Fund, the state has become one of the largest producers 
of NBCTs: 1,116 or about 4 percent of the teaching workforce.” This 
compares favorably with the national average of about 2 percent. To 
our knowledge, however, there has been no notable research on the 
effectiveness of NBCTs compared with noncertified teachers in the 
state. 


Kentucky has other appealing features, as well. It is largely rural, yet 
has suburban and urban centers, including the Louisville/Jefferson 
County metro area, with a 2010 population of about 750,000." Fur- 
thermore, Kentucky uses ACT’s nationally recognized Educational 
Planning and Assessment System (EPAS) to monitor growth in stu- 
dent achievement over time. The state also has a longitudinal data 
system that uses unique identifiers to track students across the state 
and over time. The data system links students to their teachers, to the 
courses they enroll in, and to their statewide assessments. 


Chicago was selected as a second location to broaden the research 
base of the study. The city of Chicago has a population of 2.8 million, 
and its very large urban school system is home to 1,158 NBCTs, or 36 
percent of all NBCTs in the state of Illinois. Like other large urban 
districts, CPS is racially and ethnically diverse. Further CPS has been 
using EPAS since 2003 and has the results stored in a longitudinal da- 


3. Calculated based on data provided by NBPTS. 
4. Data from the U.S. Census (www.census.gov). 
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Data sources 


ta system, permitting development of study results that are compara- 
ble to those in Kentucky. 


Our analysis of student outcomes relies on administrative data from 
all CPS high schools and all public middle and high schools in the 
state of Kentucky. Student-level data files were provided by CPS 
through the University of Chicago Consortium on Chicago School 
Research, and the Kentucky Department of Education, respectively. 
These data files include school enrollment records, course records 
linked to the teacher of record for the course, test scores, and student 
demographic characteristics. In both locations, we have four years of 
data, allowing us to measure changes in student outcomes over time 
for three cohorts of students for each analysis. In Kentucky, the data 
are available for school years (SYs) 2007/08 through 2010/11; in 
Chicago, the data are available for 2008/09 through 2011/12. 


Student test scores 
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Both CPS and Kentucky use EPAS, which consists of three tests: 
EXPLORE’, PLAN”, and ACT”. The EXPLORE is administered in 
the fall of grade 8 in Kentucky and the fall of grade 9 in CPS. In both 
locations, the PLAN is administered in the fall of grade 10; and the 
ACT is administered in the spring of grade 11. 


According to ACT, Inc., the tests are aligned so that the score of the 
next test in the series can be predicted based on the prior test. Each 
test results in five sub-area scores: English, mathematics, reading, sci- 
ence, and writing. The composite score is the average of all of the 
sub-area scores except for writing. EPAS also has the advantage of be- 
ing nationally normed, so we know how student performance com- 
pares with other students in Illinois, for example, or around the 
country. 


We conduct two sets of analyses for this study: the first uses the 
EXPLORE as a pretest and the PLAN score as the outcome measure; 
the second analysis uses the PLAN as a pretest and the ACT score as 
the outcome. The analysis sample includes only students who have 
both pretest and posttest scores. The majority of students took each 
test one time; however, if a student has more than one test score, we 


use the score from the date of the earliest test, so the results are 
comparable to students who took the test only once. 


We standardized the scale scores for each test by subtracting the na- 
tional mean score on the corresponding test from the student’s test 
score, and then dividing by the national standard deviation. This al- 
lows the magnitude of the effects to be directly compared across sub- 
jects, test (EXPLORE, PLAN, ACT), and locales (CPS, Kentucky). 
Results are examined separately for English, math, and science. We 
also examine the results for the three subjects combined.” 


Student information 


Both CPS and Kentucky administrative data collected on students in- 
clude basic demographic information, such as gender and 
race/ethnicity, as well as socioeconomic status (based on 
free/reduced-price lunch (FRL) eligibility and special education sta- 
tus (students with Individualized Education Programs (IEPs) ). Date 
of birth was used to calculate each student’s age at the beginning of 
each school year. In addition, Kentucky has an indicator for English 
as a Second Language (ESL) status, and the number of days the stu- 
dent was absent during the school year. 


The analytic sample in Chicago includes 69,741 students for the 
PLAN analysis and 48,546 for the ACT analysis. In Kentucky, the 
sample sizes are 80,490 for the PLAN and 114,465 for the ACT. 
(Some 34,903 Kentucky students are in both the PLAN and the ACT 
samples.) 


Teacher information 


NBPTS provided certification application data for teachers in Chica- 
go Public Schools and Kentucky starting with the 2000 applicant co- 
hort and ending with the 2012 applicant cohort. These data include 
application date(s), number of times applied, and the outcome of 
each application for teachers of all subjects and grade levels. We also 
have information about the subject area and age category for certifi- 
cation. 


We did not examine test scores in reading or writing because those topics 
do not align to a specific teacher. 
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Over this 13-year period, there were 4,658 unique applicants from 
CPS, and 44 percent of them achieved National Board certification. 
From Kentucky there were 4,746 unique applicants, and 54 percent 
of them achieved National Board certification. Most applicants ap- 
plied one time (71 percent for CPS, 67 percent for Kentucky); only 
about | percent of teachers applied more than three times. 


There is no unique teacher ID number in the data file from NBPTS 
that can be used to merge the file with the teacher records in the 
administrative data files from CPS or KDE. Instead, we matched the 
records using teachers’ first names, last names, and email addresses. 
We started by identifying any exact matches in either address or first 
and last name in both files. Then we looked for cases where the 
names were similar but not exact. We manually checked these rec- 
ords and compared other characteristics in the two files, such as 
school name and subject area, to determine whether the records ap- 
peared to belong to the same person. The match rate could be ex- 
pected to be less than 100 percent because our administrative data 
files include only public school teachers, while the file from NBPTS 
includes other applicants such as administrators and private school 
teachers. For the years of our analysis, the match rate is 83 percent in 
Kentucky and 78 percent in Chicago. 


In Chicago, the National Board data could be linked to the CPS per- 
sonnel data, giving us access to a richer set of teacher covariates. The 
personnel data include characteristics such as number of years teach- 
ing in the district, level of education, area of teacher certification, 
and demographic attributes. Similar data are not available for teach- 
ers in Kentucky. 


In order to link students to their teachers, we also used transcript files 
that account for all the courses in which a student enrolls and the 
teachers of each course. In Chicago, each course in the transcript file 
was coded as “core” (English, mathematics, or science—to map to the 
EPAS sub-area test scores) or “non-core.” For this analysis we restrict 
the dataset to core courses. Core courses all count toward the Illinois 
state graduation requirements. In Kentucky, we coded courses as 
English, math, or science, based on standardized state course codes. 
We also reviewed course descriptions provided by KDE and coded 
courses as primary or elective based on these descriptions. 


For both Chicago and Kentucky, we include only teachers of primary 
courses in the analysis. If the student took both a primary course and 
an elective course in a particular subject area, we included the record 
from the primary course in the analysis and included a dummy varia- 
ble in the model to indicate that the student was also enrolled in an 
elective course in the same subject area. In Kentucky, we also coded 
whether the course level is unknown, basic (e.g., remedial courses), 
regular, or advanced (e.g., honors, Advanced Placement, and Inter- 
national Baccalaureate). 


Students who have more than one primary course in the same subject 
area taught by more than one teacher were flagged as having multi- 
ple teachers. Conversely, students without any courses in the core 
subject area were flagged as having no teachers. While we cannot 
identify the individual teacher responsible for teaching these students 
in those particular semesters/years, we do not want to drop them 
from the analytic dataset. (See Appendix D for more information on 
construction of the analytic file.) 


School information 


Most of the school-level data we use for Kentucky come from the 
Common Core of Data housed at the U.S. Department of Education’s 
National Center for Education Statistics. The Common Core of Data 
makes publicly available characteristics about each school across the 
country, and the data can be aggregated up to the district, state, or 
national level. Covariates include school size, student-teacher ratio, 
student-administrator ratio (district level), percent Black students, 


6. For the Kentucky ACT sample (114,465 students): in math, 3.2 percent 
of students attended a block-scheduled course, 9.0 percent had multiple 
teachers, and 5.6 percent had no teacher (could not be matched). For 
English, the percentages were 3.6 percent block, 6.7 percent multiple, 
and 5.0 percent missing. For science, they were 3.8 percent block, 23.3 
percent multiple, and 9.8 percent missing. For the CPS PLAN sample 
(69,741 students), 12 percent of students had multiple teachers in math, 
while 1.5 percent did not have a designated teacher and 0.2 percent had 
no math class. For English, the percentages were 15 percent multiple, 
1.6 percent missing, and 0.1 percent no English class; and for science, 7 
percent multiple teachers, 1.4 percent missing, and 1.5 percent no sci- 
ence class. See Appendix D for additional information. 
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percent Hispanic students, percent FRL students, per pupil spending, 
and school locale. For CPS, we calculate school-level variables from 
the student-level data including averages of student neighborhood 
socioeconomic indices. We also use the EPAS data provided by CPS 
and KDE to calculate school-level average scores on ACT, PLAN, and 
EXPLORE in each subject area. 


Characteristics of sample schools, teachers, and students 


The percentage of students in the sample who ever had an NBCT 
during the timeframe of the analysis is 11 percent in Kentucky and 28 
percent in CPS. There are statistically significant differences between 
students who had a class with one or more NBCTs and students who 
did not on all of the characteristics we examined. As shown in Table 
2, students who never had an NBCT had lower test scores on the pre- 
tests (EXPLORE and PLAN) in math, English, and science; and had 
higher rates of absences from school than students taught by an 
NBCT. Students who never had an NBCT were also less likely to be 
Black, Hispanic, or female and more likely to be categorized as FRL, 
IEP, or ESL. This indicates that the population of students taught by 
NBCTs differs from students taught by non-certified teachers. 


Table 2: Comparison of student characteristics, by whether the student ever had a National 
Board-certified teacher. 


— oe 


Average EXPLORE pretest 14.7 15.4 -1.0* 14.5 17.0 SR 
score in math (PLAN 

sample) 

Average EXPLORE pretest 14.0 15.1 -1.1* 13.5 16.3 -2.8* 
score in English (PLAN 

sample) 

Average EXPLORE pretest 16.1 16.9 -0.8* 15.7 Hod -2.0* 
score in science (PLAN 

sample) 

Average PLAN pretest score 16.8 18.2 -1.4* 15.1 17.7 -2.6* 
in math (ACT sample) 

Average PLAN pretest score 16.1 a3} -1.2* 14.4 16.9 -2.5* 
in English (ACT sample) 

Average PLAN pretest score 17.8 18.7 -1.0* 16.2 18.0 -1.8* 
in science (ACT sample) 
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Average number of absences 


per year 

% Black 8.6 
% Hispanic Doll 
% Female 50.0 
% Free or reduced-price 47.8 
lunch 

% Individualized Education 7.8 
Program 

% English as a Second 2.6 
Language 


13.8 

3.2 
51.3 
38.8 


3.8 


4.3 


1Q 


-5.2* 
-1.1* 
-1.3* 

Qil* 


4.0* 


-1.7* 


NA 


31.5 
43.8 
55.4 
63.0 


5.1 


NA 


12.2* 
0.2 
-3.6* 

6.7* 


7.3% 


NOTES: N=160,052 students in Kentucky (34,903 in both the PLAN and ACT samples) and 89,002 students in CPS 
(29,285 in both the PLAN and ACT samples). Of Kentucky students, 16,853 had an NBCT in math, English, or sci- 
ence during the analysis timeframe. Of CPS students, 24,715 had an NBCT in math, English, or science. Signifi- 
cance was calculated using two-tailed t-tests of mean ratings for students who had an NBCT during the analysis 
timeframe compared with students who did not. *=difference is statistically significant at the .05 level.~=difference 


is statistically significant at the .1 level. 


Approximately 5 percent of teachers in Kentucky and 17 percent of 


teachers in CPS in the sample ever applied for National Board certifi- 


cation during the timeframe of the analysis (see Table 3). Among 


those teachers who do apply in Kentucky, 52 percent achieve certifi- 


cation, 36 percent do not achieve, and 12 percent have unknown 


outcomes because they completed the certification process after the 


analysis period. In CPS, 48 percent of NBC applicants achieve certifi- 


cation, 21 percent do not achieve, and 31 percent are unknown be- 


cause they withdrew from the process or completed after the last date 


reported from NBPTS. 
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Table 3: Number and percentage of teachers in the sample who ever ap- 
plied for National Board certification and who achieved it dur- 


= the timeframe of the analysis. 


N % N % 

Teacher ever applied for NBC? 

Yes 423 4.6 665 16.5 

No 8,839 95.4 3,357 83.5 
Teacher ever achieve NBC? 

Yes 221 52.3 321 48.3 

No 153 36.2 138 20.8 

Unknown 49 11.6 206 31.0 


NOTE: N=9,262 teachers in Kentucky and 4,022 teachers in CPS. 


The percentage of schools that had an NBCT during the analysis pe- 
riod was 64 percent in Kentucky; 84 percent of schools in the CPS 
sample had an NBCT during the analysis period. As shown in Table 4, 
Kentucky schools with NBCTs are more likely to be in suburban are- 
as, and less likely to be in rural areas than school without NBCTs. 
Thus, it is not surprising that Kentucky schools with NBCTs have 
larger total enrollments than schools without NBCTs. Schools in Ken- 
tucky with NBCTS also have fewer FRL students and higher test scores 
on the EXPLORE and the PLAN than schools without NBCTs. Simi- 
larly, Chicago schools that had at least one NBCT are larger on aver- 
age than schools without any NBCTs. Chicago schools with any 
NBCTS also have somewhat higher average test scores. 


Table 4: Comparison of school characteristics, by whether the school ever had a National Board— 
certified teacher. 


Total enrollment 
Student-teacher ratio 
Student-administrator ratio 
(in district) 

% Black students 

% Hispanic students 


% Free or reduced-price 
lunch students 


Per pupil spending ($) 

% Urban schools 

% Suburban schools 

% Town schools 

% Rural schools 
School-level average 
EXPLORE score in English 
School-level average 
EXPLORE score in math 
School-level average 
EXPLORE score in science 
School-level average PLAN 
score in English 
School-level average PLAN 
score in math 

School-level average PLAN 
score in science 


667.6 
17.4 
2 2B 


12.2 
1.8 
61.7 


10,294 
13.3 
7.8 
23.3 
55.6 
13.9 


14.5 


16.4 


14.7 


Q39).9) 
18.2 
221.8 


13.1 
2.4 
49.8 


10,392 
18.6 
18.0 
21.6 
41.9 
15.3 


16.1 


17.2 


2 


13.9 


15.5 


-272.3* 


-0.7 
-9.6 


-0.8 
-0.6 
12.0* 


99* 

-5.2 
-10.2* 

1.8 


13.6* 
-1.5* 


-1.6* 


-0.9* 


-1.1* 


-0.8* 


-0.8* 


532.5 


15.6 
NA 


74.6 
18.9 
92.2 


1134.6 


15.3 
NA 


54.8 
35.0 
85.3 


NA 
NA 
NA 
NA 
NA 
13.5 


14.1 


15.5 


14.4 


15.2 


16.6 


-602.1* 
0.3 


19.7 
-16.1 
6.9 


-1.8~ 


-1.2~ 


NOTES: N=359 schools in Kentucky and 100 schools in CPS. Significance was calculated using two-tailed t-tests of 
mean ratings for schools that had an NBCT during the analysis timeframe compared with schools that did not. 
*=difference is statistically significant at the .05 level. ~=difference is statistically significant at the .1 level. 
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Classroom observations 


One aspect of our evaluation involves classroom observations of a 
sample of NBC candidate teachers and a sample of other teachers 
with similar characteristics in similar classroom settings who are not 
pursuing certification. The goal of this part of the study is to chart 
and compare applicants and non-applicants and to examine these 
teachers’ use of effective instructional practices over time. 


Changes in instructional quality are examined for 27 math and sci- 
ence teachers in Kentucky and Chicago over a three-semester period. 
Where possible, we observed each teacher twice in the same semester 
to improve the quality of the data, using the average of the two obser- 
vation scores for our analysis. However, it was not always possible to 
arrange for two observations each semester due to scheduling con- 
straints. 


We use the observations to address the following research question: 


1. Does the NBPTS certification process influence teachers’ class- 
room practices? 


Comparing any gains in instructional quality for the two samples lets 
us draw conclusions about the effects of participation in certification. 
This study design requires a comprehensive observation instrument 
to document what is observed, a tool for assigning numeric scores to 
the instructional practices observed, and consistent and reliable data 
collection and scoring procedures to maintain the internal validity of 
these data. 


Classroom observation instrument 


We selected the Leadership by Design (LBD) classroom observation 
instrument for use in the study (see Appendix A). This instrument 


7. We slightly modified the instrument by moving the classroom context 
indicators from the front of the instrument to the end. This change was 
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has been widely used in Kentucky and elsewhere; classroom observa- 
tion data have been collected using the LBD instrument for more 
than 3,000 teachers in more than 250 elementary, middle, and high 
schools in seven different states. Projects using the LBD include work 
funded by the U.S. Department of Education and the National Sci- 
ence Foundation. The LBD also has been adopted by the National 
Science Teachers Association as a program improvement tool to help 
assess and improve the quality of instruction in middle school and 
high school classrooms. 


LBD measures the quality of instructional practices in science and 
math, as well as capturing information about the classroom setting. 
The instrument is completed during observations lasting 45 to 90 
minutes by trained observers with subject-matter expertise. The ru- 
bric itself consists of 33 elements spanning nine dimensions: /esson 
overview, instructional overview, questioning, classroom atmosphere, 
concept development, teacher's content knowledge, learning climate, 
classroom management, and assessments. 


The data collected through the LBD is descriptive in nature. Observ- 
ers make notes, for example, about the types of questioning tech- 
niques used by the teacher, the amount of student investigation or 
research, the type of basic and higher-level skills being developed, 
and the teacher’s use of formative and/or summative assessments to 
measure student learning. The LBD acts as a memory device for the 
observer; the data collected from the LBD are not used directly to 
rate the quality of instruction. 


Rubric for scoring classroom observations 
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To assign numeric scores to the observation data collected with the 
LBD, we developed a “LBD Classroom Observation Rubric” for this 
study (see Appendix B). Prior to using the rubric in our evaluation, 
the research team piloted it using observations of a small sample of 
teachers (see Appendix C). The pilot test did not identify any prob- 
lems in transferring the observation data to the rubric, and also con- 


made so that the evaluator would not be distracted by the classroom 
context while evaluating teaching quality. 


firmed that the scoring data produced by the rubric were internally 
consistent. 


The rubric consists of nine instruction-related subscales, plus an 
overall rating. The subscales are based on the average rating of three 
to five specific items aligned with the LBD instrument. Each item on 
the rubric is scored on an integer scale of 1-5, with 5 being the high- 
est rating and 1 the lowest. Scores of 3 and below show areas needing 
improvement. The rubric also has a subscale for the classroom’s phys- 
ical setting, collected to provide baseline contextual information and 
not used to evaluate the teacher or quality of instruction. 


After rating each of the items on the rubric, observers assign an over- 
all rating of the quality of the instruction. This overall rating takes in- 
to account the observer’s overall impression, including the 
effectiveness of instruction, alignment with objectives and standards, 
student engagement, and instruction to develop students’ higher- 
order thinking skills. Observers are required to write comments cor- 
responding to the overall rating to provide context for understanding 
why the rating was selected. Table 5 provides an example of the rating 
rubric for the overall classroom observation rating. 


Table 5: LBD Classroom Observation Rubric ane scale for overall classroom observation a 


5 Instruction was of high quality and effective for all students; evidence that instruction was 
based on clearly defined objectives that were fully aligned with standards; all students were 
engaged in activities requiring higher level thinking skills 


4 Instruction was of high quality and effective for most students; evidence that instruction was 
based on clearly defined objectives that were aligned with standards; most students were en- 
gaged in activities that required higher level thinking skills. 


3 Instruction was of good quality and effective for many students; instruction appeared to be 
based on student objects somewhat aligned to standards; some students had an opportunity 
for higher level thinking skills development. 

2 Instruction was of mediocre quality and effective for only a small portion of the students; little 
evidence that instruction as based on student objectives; instruction had minimal impact on 
student learning. 


1 Instruction was of poor quality and was not effective for any students; no evidence that in- 
struction was based on student objectives; learning was not based on instruction provided. 


Recruitment of teachers 


Each year NBPTS provided us with contact information for any new 
NBC applicants in Chicago and Kentucky. All new applicants in math 
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and science at the high school level were contacted and asked to par- 
ticipate in the study. Teachers who agreed to participate committed 
to being observed twice per semester for three consecutive semesters. 
These semesters correspond to the beginning, middle, and end of 
the National Board certification cycle. 


Even after repeated attempts, only about half of the teachers we con- 
tacted agreed to participate. Teachers have many competing de- 
mands on their time, particularly those engaging in a time- 
consuming endeavor such as National Board certification. In addi- 
tion, many teachers we contacted expressed reluctance at having an 
unknown observer come into their classroom. 


Once the NBC applicants were recruited for the classroom observa- 
tions, the principal of each school was asked to identify a similar 
(control) teacher in the same school who was not an NBC applicant. 
The research team requested that the teachers selected for the con- 
trol group be state certified in math or science and have at least three 
years of teaching experience (to match NBC eligibility require- 
ments). We do not have any evidence, but we expect that the princi- 
pal probably selected as the control someone who was perceived to 
be a “good teacher,” so there could be no perception that the school 
was not doing a good job. We also expect that the control teachers 
were selected because they were confident and willing to have an out- 
side observer in their classroom. This means that the control group 
may include higher-quality teachers than the “average” teacher. Four 
NBC applicants had no matched control teacher because the princi- 
pal of their school declined to name one. 


We were able to recruit 32 teachers, whom we observed at least once; 
27 of these teachers were observed all three times. Due to the small 
number of new NBC applicants who agreed to participate in the 
study, we recruited over several semesters. 


Observations were conducted from the spring semester of SY 
2010/11 through the fall semester of SY 2013/14. Table 6 shows a to- 
tal of 27 teachers were observed at all three time points: 9 in math 
and 18 in science. Fifteen (15) of the teachers were NBC applicants 
and 12 were not. The analysis includes only teachers observed at 
three time points, so that the sample is the same for the comparisons 
at each time point. Five additional teachers were observed only once 


or twice (e.g., because the teacher retired or left the school); these 
were excluded from the analysis. 


Table 6: Number of teachers observed at three time points, by location. 


Math 1 8 9 
Science 10 8 18 
NBC applicants 6 9 15 
Non-NBC applicants 5 7 2 
Total 11 16 DY 


Classroom observation process 


The LBD and rubric data were collected during prearranged class- 
room visits by site observers. Observers were not informed which 
teachers were NBC applicants and which were not. The developer of 
LBD (Co-Principal Investigator Dr. Stephen Henderson) trained all 
observers annually to use the LBD and scoring rubric. All observers 
are experienced math or science teachers who also have used the 
LBD instrument for previous studies. 


Participating teachers were instructed to teach the same lesson they 
would normally teach on the day of the visit and to use the same 
techniques/materials they would normally. During the classroom ob- 
servation, the observer filled out the LBD instrument, marking items 
as they were observed. While in the classroom, the observer also 
looked for the following as used and available: text and other instruc- 
tional resources currently being used; any student workbook(s) used; 
sample assessment given by the teacher; and a student laboratory 
manual or portfolio. 


Following the observation, the teachers were asked to participate in a 
5- to 10-minute debrief interview with the observer. Questions asked 
include the following: 


e What were the goals of today’s class? 
e What went well in this class? What didn’t go well? 


e What are your thoughts on goals for tomorrow’s class? 
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After the site visit, the observer reflected on the observation and, us- 
ing the completed LBD instrument, filled out our LBD Classroom 
Observation Rubric. The classroom materials and the discussion with 
the teacher also enabled the observer to better understand what was 
observed, facilitating more accurate completion of the rubric. 


Completed LBD observation instruments and scoring rubrics were 
collected by Dr. Henderson from the classroom observers following 
their classroom visits. Copies of the completed data collection in- 
struments were provided to CNA for independent analysis. 


Results: Baseline ratings for NBC applicants and non- 


applicants 
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We begin our discussion of the results by describing the baseline (ini- 
tial) observations for all teachers, comparing NBC applicants and 
non-applicants. This section examines whether National Board appli- 
cants have higher ratings of instructional quality than non-applicants 
just as the former start the steps in the certification process. 


The mean difference between NBC applicant and non-applicant 
scores was calculated, and statistical significance was tested using a 
two-tailed t-test for unequal sample sizes and unequal variances. Fig- 
ure 2 shows the average overall rating scores for all NBC applicants 
and non-applicants, as well as for math and science teachers separate- 


ly. 


There is some evidence that for all teachers, the overall ratings are 
higher for National Board applicants than for non-applicants (mean 
of 3.8 versus 3.2; a difference of 0.6, p<.10). However, the difference 
was Statistically significant only for math teachers, with those NBC 
applicants’ average overall rating (4.3) being a full point higher than 
the non-applicants’ overall rating (3.3). 


Figure 2: Average overall ratings for the baseline observations for NBC applicants and non- 
applicants, overall and by subject. 


All teachers Math teachers Science teachers 


mNBCApplicants mNon-Applicants 


NOTES: Scale ranges from 1 (low) to 5 (high). N=27 teachers. Significance was calculated using two-tailed t-tests of 
mean ratings for NBC applicants compared with non-applicants. *=difference is statistically significant using a 95 
percent confidence level. ~=difference is statistically significant using a 90 percent confidence level. 


Next, we compared the baseline observations for NBC applicants and 
non-applicants on each of the nine rubric subscales. Table 7 shows 
that there is substantial variation in the range of scores for both NBC 
applicants and non-applicants. For both groups, the minimum scores 
for most subscales are below 3.0, and all of the maximum scores are 
between 4.5 and 5.0. The standard deviations range from 0.74 to 1.21 
on a 5-point scale. 
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Table 7: Descriptive statistics for baseline observation scores for NBC applicants and non- 
applicants for each of the nine subscales and the overall rating scale on the LBD Class- 
room Observation Rubric. 


Std. Std. 

Min. Max. Mean Dev. Min. Max. Mean Dev. 

Overall rating 3.00 5.00 3.75 1.03 2.00 5.00 3.17 0.86 
Lesson overview 3.20 5.00 A417 0.81 1.60 4.60 3.40 0.86 
Instructional overview 2.83 5.00 3.88 0.89 1.33 5.00 3.28 0.89 
Questioning 2.84 5.00 4.04 0.97 1.00 4.75 3.15 1.04 
Classroom atmosphere 2.10 4.50 3.63 0.78 2.00 5.00 3.61 0.81 
Higher-order skills 1.50 5.00 3.18 1.20 1.00 5.00 2H 1.19 
Content knowledge 2.50 5.00 3.98 0.87 2.00 5.00 3.33 0.96 
Positive climate 2.60 5.00 4.43 1.20 2.60 4.60 3.80 0.74 
Implements instruction 2.67 5.00 3.99 0.87 1.33 5.00 2.97 1.21 
Assesses learning 1.67 5.00 3.63 0.77 1.67 4.67 2.86 1.04 


NOTE: Scale ranges from 1 (low) to 5 (high). N=27 teachers. 


As shown in Figure 3, the average score for NBC applicants is statisti- 
cally significantly higher than the average score for non-applicants on 
six of the nine rubric subscales: /esson overview, questioning, content 
knowledge, positive climate, implements instruction, and assesses 
learning. Variation in scores was greatest for the questioning and 
higher-order skills subscales, which ranged along the full scale of the 
rubric, with a 4-point difference between the minimum (1) and the 
maximum (5) scores. 
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Figure 3: Range of baseline observation scores for NBC applicants and non-applicants for each 


5.0 


4.0 


3.0 + 


2.0 


1.0 5 


0.0 


of the nine subscales on the LBD Classroom Observation Rubric. 
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NOTES: Scale ranges from 1 (low) to 5 (high). N=27 teachers. Significance was calculated using two-tailed t-tests of 
mean ratings for NBC applicants compared with non-applicants. *=difference is statistically significant using a 95 
percent confidence level. ~=difference is statistically significant using a 90 percent confidence level. 


Below we describe the subscales for which there was a statistically sig- 
nificant difference between NBC applicants and non-applicants at 
baseline. We also provide an example of an observer’s description 
from a geometry class observation of an NBC applicant in math who 
had a high rating (4.5 or above) on each of these subscales. 


e Lesson overview: NBC applicant mean=4.2 versus non- 
applicant mean=3.4, a difference of 0.8 (p<.05). This rating 
takes into account communication of lesson objectives, use of 
instructional resources to achieve the objectives, presentation 
of content in an accurate and grade-level-appropriate manner, 
place of the lesson in the instructional sequence, and choice of 
seating arrangements for the lesson. In the observation for the 
sample teacher’s class, the observer commented, 


The lesson on finding patterns on a unit circle was 
completely explored through pre-assessment, hands-on 
investigation, printed charts and diagrams, and _tech- 
nology. Students were seated in functioning groups for 
both individual and group accountability. 


ay 
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e Questioning: NBC applicant mean=4.0 versus non-applicant 


mean=3.1, a difference of 0.9 (p<.05). This rating takes into ac- 
count the quality of the questions, student participation in 
questioning, use of strategic or target-centered questions for 
formative assessment, and feedback to students on responses. 
In the observation for the sample teacher’s class, the observer 
commented, 


Questions were purposeful and designed to discover 
misconceptions. All students were expected to be ac- 
countable in answering questions either in whole group 
discussions or in small groups. Wait time was not partic- 
ularly intentional, but the type of questions required 
students to reason, and feedback was qualitative. 


Content knowledge: NBC applicant mean=4.0 versus non- 
applicant mean=3.3, a difference of 0.7 (p<.10). This rating in- 
cludes communicating content knowledge to students, con- 
necting content to life experiences, using instructional 
strategies appropriate for content, and guiding students to un- 
derstand lesson content from various perspectives. The observ- 
er noted, 


The teacher is exceptional and is able to orchestrate the 
various stages of the lesson seemingly effortlessly. He 
made a couple of realistic connections with the clock 
(unit circle and degrees) and periodic behavior (the si- 
ne curve). Students considered patterns on the unit cir- 
cle chart, diagram of circle, sine curve using coordinate 
plane, string, and spaghetti, and on graphing calculator. 


Positive climate: NBC applicant mean=4.4 versus non-applicant 
mean=3.8, a difference of 0.6 (p<.05). To achieve a high rating, 
teachers must communicate high expectations, establish a posi- 
tive learning environment, value and support student diversity, 
foster mutual respect between teacher and students and among 
students, and provide a safe environment for learning. The ob- 
server commented, 


Students knew they were expected to accomplish tasks 
in assigned periods of time, and activities changed often 
to meet the needs of all students. The teacher had in- 
credibly good rapport with students. 


e Implements instruction: NBC applicant mean=4.0 versus non- 
applicant mean=3.0, a difference of 1.0 (p<.05). To achieve a 
high rating, teachers must implement instruction based on 
student needs and assessment data, use resources effectively, 
and manage instruction to facilitate higher-order thinking. The 
observer commented, 


As the teacher monitored groups, he asked questions to 
determine if clarification was needed or if students were 
ready to explain their new pattern to the whole group, 
or figure out their misconceptions. While students did 
the warm-up, the teacher took roll and spot-checked 
every student’s homework, and collected garbage and 
materials as students did an assessment at the end of 
class. Each student in small groups had a task to accom- 
plish. Students had ample purposeful independent and 
group processing and reflection time. 


e Assesses learning: NBC applicant mean=3.6 versus non- 
applicant mean=2.9, a difference of 0.7 (p<.10). This rating in- 
cludes using assessments aligned with learning objectives, using 
a variety of formative and summative assessments to measure 
learning, and adapting assessments to accommodate diverse 
learning needs. The observer noted, 


Besides the warm-up and homework checks, students 
took a 3-minute pre-assessment on their knowledge of 
the unit circle and then checked it themselves with the 
chart, answered questions asked by the teacher 
throughout the activities and by other students (teacher 
directed others to answer questions), reported on their 
patterns discovered, and demonstrated their learning 
with a writing assignment at the end (choice of 2 
prompts—explain the pattern on the calculator or ex- 
plain the concept in a short paragraph). 


Results: Change in ratings over time for NBC applicants and 


non-applicants 


To examine the effects of the National Board certification process, we 
compared the ratings from the baseline observations with the subse- 
quent revisit observations, to see how the teachers’ ratings change 
over time. Figure 4 shows the average overall ratings on the baseline, 


4] 


second, and third observations for NBC applicants and non- 
applicants. 


There are minimal differences between the baseline and subsequent 
observations for both groups of teachers, and none of the differences 
is statistically significant. This suggests that undergoing National 
Board certification does not have a distinguishable effect on teachers’ 
overall quality of instruction. 


Figure 4: Average overall ratings over time for NBC applicants and non-applicants. 
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NOTES: Scale ranges from 1 (low) to 5 (high). N=27 teachers. Significance was calculated using two-tailed t-tests of 
mean ratings for the baseline observation compared with each subsequent observation. *=difference is statistically 
significant using a 95 percent confidence level. ~=difference is statistically significant using a 90 percent confi- 
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dence level. 


We do not necessarily expect the National Board certification process 
to significantly affect the teachers’ classroom practices on all of the 
LBD subscales, which is why we examine the differences separately 
for each subscale. It is also possible that certain subscales may be af- 
fected at different points in the application process, or that teachers’ 
timing of implementing certain instructional elements may vary. 
Thus, we conduct comparisons both between the NBC applicants’ 
second observation and baseline observations and between the NBC 
applicants’ third observations and baseline observations. We also 
check whether there are any changes over time in the ratings for the 
non-applicants, although we do not anticipate significant differences 


since these teachers are working under business-as-usual circum- 
stances. 


For the non-NBC teachers, we find no statistically significant differ- 
ences between their scores at baseline and the second time or third 
time they were observed on any of the nine LBD rubric subscales (see 
Table 8). For NBC applicants, only one of the subscales (classroom 
atmosphere) has a statistically significantly increase over the baseline 
observation. 


Table 8: Changes over time for the overall rating and subscale ratings for NBC applicants and 
non-applicants. 


Change: —_ Change: Change: Change: 

Obsv. Obsv. Obsv. Obsv. 

Obsv. 1 1 vs 2 1 vs 3 Obsv. 1 1 vs 2 1 vs 3 
Overall rating 3.77 0.20 -0.04 317 0.00 0.25 
Lesson overview 4.17 -0.05 -0.05 3.40 0.30 0.15 
Instructional overview 3.88 -0.01 0.06 3.28 0.22 0.11 
Questioning 4.04 -0.19 -0.26 3.15 0.02 0.18 
Classroom atmosphere 3.63 0.55* 0.53~ 3.61 0.06 0.08 
Higher-order skills 3.18 0.00 0.02 Def | -0.25 0.21 
Content knowledge 3.98 -0.21 0.05 3.33 0.07 0.00 
Positive climate 4.43 0.01 -0.08 3.76 0.06 0.00 
Implements instruction 3.99 -0.18 0.01 2.97 0.22 0.31 
Assesses learning 3.63 -0.16 -0.10 2.86 0.42 0.33 


NOTES: Scale ranges from 1 (low) to 5 (high). N=27 teachers. Significance was calculated using two-tailed t-tests of 
mean ratings for the baseline observation compared with each subsequent observation. *=difference is statistically 
significant using a 95 percent confidence level. ~=difference is statistically significant using a 90 percent confi- 
dence level. 


Figure 5 shows changes over time in the average ratings on the c/ass- 
room atmosphere subscale. The NBC applicants’ average increased 
from baseline (3.6) to the second observation (4.2) and remained 
constant for the third observation (4.2). The improvement in the 
NBC applicants’ average scores was statistically significant for the sec- 
ond observation relative to the baseline observation, as well as for the 
third observation relative to the baseline. The mean rating for the 
non-applicants remained similar at 3.6 to 3.7 for all three observa- 
tions. 
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Figure 5: Average ratings on the “classroom atmosphere” subscale for NBC applicants and non- 
applicants, by timing of observation. 
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NOTES: scale ranges from 1 (low) to 5(high). N=27 teachers. Significance was calculated using two-tailed t-tests of 
mean ratings for the baseline observation compared to each subsequent observation. *=difference is statistically 
significant using a 95 percent confidence level. ~=difference is statistically significant using a 90 percent confi- 
dence level. 


To obtain the highest rating on the classroom atmosphere subscale, 
teachers must demonstrate the following: 


e Student involvement: All of the students demonstrated interest 
and were engaged in the instructional activity. 


e Classroom management: The classroom was well managed and 
totally orderly; there were no student disruptions which caused 
a loss of instructional time or impaired the learning environ- 
ment. 


e Classroom culture: The teacher has established a classroom 
culture in which all, or nearly all, of the students take initiative 
in discussions and activities; all students demonstrated respect 
for other students; all, or nearly all, demonstrated enthusiasm, 
confidence, persistence, and accuracy while solving problems. 


In one observation of an NBC applicant with a rating of 5, the ob- 
server noted, 
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All students were actively involved in every stage of the les- 
son for the full 100 minutes of class. They exhibited curiosi- 
ty, confidence, persistence, responsibility, accuracy, and 
enthusiasm. 


In another highly rated class, the observer described the classroom 
atmosphere by noting, 


No “down” time exists during this 60-minute class. All stu- 
dents are curious, persistent, confident, enthusiastic, and 
accurate in their work, and the environment is one of active 
thinking and learning from interaction among the content, 
the teacher, and the students. Students sit at science tables 
and discuss or share within a pair or threesome or across ta- 
bles in larger groups. 


This description seems to reflect what instruction would look like 
under the NBPTS “Architecture of Accomplished Teaching Helix,” as 
shown in Figure 1 in the introduction. If the teacher is meeting the 
needs of each student at the place that student is, then all students 
should be engaged in the activities and behaving in an orderly man- 
ner. The classroom culture should also reflect student initiative, re- 
spect, and enthusiasm for learning. 


Results: Changes in instructional quality for applicants with 
different baseline ratings 


One potential limitation to examining changes in instructional quali- 
ty for National Board applicants over time is the ceiling effect. Be- 
cause NBC teachers begin with higher ratings for instructional quality 
than non-applicants at baseline, they may have limited room for im- 
provement. We conducted additional analyses to examine this possi- 
bility, which are described in Appendix F. 


We found no evidence that National Board applicants whose ratings 
at the baseline observation were in the bottom quartile demonstrate 
greater improvement over time than do applicants who whose base- 
line ratings are in the top quartile. 


Results: Classroom context 


Lastly, we examined differences in the physical setting subscale for 
NBC applicants and non-applicants. Even though these ratings of the 
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classroom context do not contribute to the ratings of instructional 
quality, they are important for understanding limitations in the types 
of activities that teachers may be able to conduct during their lessons. 


The physical setting subscale is based on the following three items: 


e Classroom facilitates student learning: This item considers the 
flexibility of student seating, the adequacy of utilities (e.g., elec- 
trical outlets), and whether flat top surfaces are available for 
conducting hands-on activities. 


e Classroom facility: This is based on whether the classroom is 
adequate in size for the number of students, the adequacy of 
resources and equipment, and the availability of furnishings for 
activity-based instruction. 


e Classroom environment: This item takes into account the avail- 
ability of materials, textbooks and reference books, computers 
for student use, display of student work, and evidence of ongo- 
ing projects. 


During our study, observers visited classrooms that ranged widely in 
classroom environment. In one observation that scored a | for physi- 
cal setting, the observer noted, 


The room was sufficiently large to accommodate 25 stu- 
dents, but the furnishings included individual desks and no 
lab facilities. However, the most significant obstacle to high 
quality science was the requirement for teachers to move to 
another room for each class. Although there was a storage 
room adjacent to the classroom, the few pieces of science 
equipment observed were outdated, and in some cases, in- 
operable. Walls were devoid of anything related to science, 
and there were no displays of student work or projects. 


Such classrooms limit the types of activities the teacher could con- 
duct. Conversely, during an observation that scored a 5, the observer 
noted, 


Students worked at tables using laptops, iPads, and TI- 
84 Plus calculators. Mathematics displays promoted 
learning. 


As shown in Figure 6, there were differences by National Board ap- 
plicant status for two of the three physical setting items. NBC appli- 
cants received higher ratings than non-applicants for “classroom 


facilitates student learning” (mean of 4.1 versus 3.3, a difference of 
0.8) and “classroom environment” (mean of 3.7 versus 2.5, a differ- 
ence of 1.2). This suggests that National Board applicants taught in 
classrooms that were better designed for student learning and had 
access to more instructional resources than did non-applicants. 


Figure 6: Average ratings on the three items of the “physical setting” sub- 
scale for NBC applicants and non-applicants. 


Classroom facilitates Classroom Classroom 
student learning facility environment 


mNBCApplicants mNon-Applicants 


NOTES: Scale ranges from 1 (low) to 5 (high). N=27 teachers. Ratings are from the base- 
line observation. Significance was calculated using two-tailed t-tests of mean ratings 
for NBC applicants compared with non-applicants. *=difference is statistically signifi- 
cant using a 95 percent confidence level. ~=difference is statistically significant using 
a 90 percent confidence level. 


All of the control teachers except for two were from the same school 
as one of the National Board applicants. This means that the differ- 
ences identified in the classroom context between applicants and 
non-applicants are occurring within the same school. These findings 
may suggest that National Board applicants are more resourceful in 
organizing their classrooms or obtaining the necessary resources to 
support productive learning than are non-applicants. 


47 


48 


This page intentionally left blank. 


Student outcomes 


As described in the previous section, we used qualitative data from 
classroom observations to address our first research question: 


1. Does the National Board certification process influence 
teachers’ classroom practices? 


The goal of the statistical analysis of student test scores described in 
this section is to answer our remaining questions: 


2. Are National Board—certified teachers more effective than 
other teachers? 


3. Are applicants who attain National Board certification more 
effective than applicants who do not? 


4. What effect, if any, does the National Board certification pro- 
cess have on teacher effectiveness? 


To answer the different questions, we compare different groups of 
teachers. We explore the first question, which asks whether National 
Board certification is a good signal of teacher effectiveness, by com- 
paring the effectiveness of National Board-—certified teachers with 
teachers who are not certified. The second question, which considers 
the effectiveness of National Board certification as a screening pro- 
cess, is answered by comparing teachers who apply for and achieve 
certification with those who apply for but do not achieve it. The third 
question addresses the professional developmental properties of the 
National Board certification process itself, by comparing the effec- 
tiveness of individual teachers against themselves at different stages 
(before, during, and after) in their application process. 


In each case, we will examine the evidence of teacher effectiveness as 
measured by student posttest scores on the ACT and the PLAN 
standardized tests. 
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Methods: Estimation model 
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We will use an “education production function” approach to relate 
school, teacher, and student-level characteristics to the outcome, with 
the base statistical model being a standard linear regression model. 
Each observation represents an individual student linked to his or 
her current subject-area teacher (or set of subject-area teachers, in 
the case of students who had multiple teachers between pretest and 
posttest; see the “Description of the Data” chapter). All models cor- 
rect the standard errors for clustering of the data by teacher. 


Outcome (dependent) variables 


For each of these three research questions, the outcome variable is a 
student’s test score. One set of models, which we refer to as the 
“PLAN to ACT” analysis, use the student’s ACT score as the outcome, 
with the student’s previous PLAN score as the prior test score. A sec- 
ond set of models use the student’s PLAN score as the outcome, with 
the student’s previous EXPLORE score as the prior test score. We re- 
fer to this second model as the “EXPLORE to PLAN” analysis. Sepa- 
rate models are run for each subject: math, English, and science. We 
also run a combined model that includes all of the subjects, with ad- 
ditional variables to indicate whether the observation outcome repre- 
sents a math, English, or science test score.® Results are also 
presented separately for Kentucky and CPS. 


One difference between our study and other studies in this literature 
is that we do not have an annual student achievement measure. In 
Kentucky, students typically take the EXPLORE at the beginning of 
8th grade, the PLAN at the beginning of 10th grade, and the ACT at 
the end of 11th grade. Thus, depending on the analysis, the prior test 
score occurs three to four semesters before the posttest outcome. Be- 
cause there are multiple semesters between the prior score and the 
outcome, and these are high school students who may switch teachers 


So that we can compare scores across subject areas, we standardize all 
test scores used in our models by subtracting the national-level subject- 
specific mean from the student’s score and dividing by the nationallevel 
subject-specific standard deviation. 


from semester to semester, each student-level observation will involve 
more than one teacher. 


In Kentucky, we observe the student’s course-taking each semester; 
so, for a given subject, there will typically be three or four teachers 
between the pretest and the outcome. In Chicago, the test-taking 
schedule is different, in that students typically take the EXPLORE in 
9th grade rather than 8th. Additionally, in Chicago, core courses typi- 
cally run for a full year; because we only observe student course- 
taking on a year-by-year basis, rather than each semester, there will be 
at most two teachers per student, per subject, between the pretest 
and the outcome for CPS analyses. 


Explanatory variables 


A challenge in estimating teacher effectiveness using longitudinal da- 
ta systems, as we do here, is that neither teachers nor students are 
randomly assigned to their classrooms, or to their schools. Education- 
minded parents choose housing taking school quality into account; 
teachers choose where to work based in part on the school’s quality; 
the most effective school leaders find ways to recruit early to obtain 
the best candidates; and once in their schools, principals assign stu- 
dents to teachers thoughtfully, not at random. 


As a result, there likely are systematic differences in student and 
teaching assignments that affect test scores, but that have nothing to 
do with National Board certification. Because of this challenge, for 
each analysis we use a variety of statistical controls and estimate five 
different regression models to get a fuller picture of the likely true ef- 
fect of National Board certification on student test scores. 


Model 1 is our baseline model. It includes the student’s prior score 
(the EXPLORE score in the case of models with the PLAN as the 
outcome, and the PLAN score in the case of models with the ACT as 
the outcome), by subject, to control for past student achievement. It 
also includes student age, the number of student absences (KY only), 
and standard demographic indicators for racial/ethnic background, 
gender, FRL eligibility, special education status (IEPs), and ESL status 
(KY only). Controlling for these observable student characteristics 
helps level the playing field when we compare student outcomes and 
attribute differences to teaching effectiveness. Model 1 also includes 
a control for the number of years of experience for each teacher for 
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CPS; for Kentucky, it includes a proxy for experience, given by the 
number of years the teacher appears in the dataset.” 


Model 1 likely overstates the true NBC effect, because it does not take 
into account all of the differences in students that may be present be- 
tween NBCTs and the comparison teachers. In addition, the model 
does not account for differences across schools in the contributions 
schools make to student performance, including, the contributions 
of school leaders and administrators, instructional materials, and 
other students. But it does provide us with a best-case, baseline esti- 
mate of teacher effectiveness, comparing NBCTS to other teachers 
across the district or state, after controlling for the characteristics of 
students assigned to each teacher. 


Model 2 adds to model 1 a set of school characteristics, to control for 
across-school differences. Our school-level variables include total en- 
rollment, student-teacher ratio, racial/ethnic composition of the stu- 
dent body, and percentage of the student body FRL eligible. We also 
include, at the district level for Kentucky, the student-administrator 
ratio and per-pupil spending, as well as the urban-centric locale code 
(urban, suburban, town, or rural) ./” We also include the school-level 
average pretest score (the EXPLORE for the analysis with the PLAN 
outcome, and the PLAN for the analysis with the ACT outcome) in 
English, math, and science, as a measure of the school’s overall 
achievement level. In model 2, our comparison is between NBCTSs 
and other teachers in similar schools, controlling for characteristics 
of each student assigned to them. 


"For Kentucky, if the student was assigned to multiple teachers, or the 
teacher was unknown, we treated the teacher experience proxy variable 
as missing data, and flagged the observation. For the average incoming 
prior test score, we calculated separate averages for students assigned to 
“BLOCK,” “MISSING,” and “MULTIPLE,” respectively. In Chicago, it is 
the overall average regardless of why the student does not have an indi- 
vidually identified teacher. For CPS, we also include “experience 
squared.” This variable accommodates the nonlinear relationship be- 
tween experience and teacher effectiveness. 


10. These variables are not included in the CPS model since it is a single dis- 
trict. 


Model 3 takes a step back and adds to model 1 the average prior test 
score for the group of students assigned to each teacher. Including 
this variable better accounts for within-school differences in how stu- 
dents are assigned to teachers that may be correlated with student 
outcomes. While model | controls only for the characteristics of indi- 
vidual students, model 3 takes into account the overall prior perfor- 
mance of students, which can affect instructional challenges in the 
classroom. 


Model 4 adds to model 1 both the school characteristics used in 
model 2 and the average prior test scores of students assigned to each 
teacher, used in model 3. This provides us with an estimate of the 
NBC effect to address the nonrandom assignment of students both 
across and within schools. 


Our final model, model 5, replaces the set of student characteristics 
in model 4 with a set of school-level fixed effects. The school fixed- 
effects model provides a stronger control for differences across 
schools that may affect our measurement of teacher effectiveness, be- 
cause it provides a way to account for time- and subject-invariant 
school-specific factors that influence student performance that we 
otherwise cannot observe in our data. 


In general, we expect model 5 to provide our most conservative esti- 
mate of the effectiveness of NBCTs compared with other teachers. 
However, this model may actually understate the difference in effec- 
tiveness between NBCTs and other teachers, because teachers also 
sort themselves across schools. Indeed, unlike model 1, which pro- 
vides an (likely overstated) estimate of the effectiveness premium of 
NBCTs compared with a typical teacher in the system, model 5 pro- 
vides an estimate of the National Board effect in comparison to a typ- 
ical teacher in the same school. Because teachers within a single 
school are generally more similar to one another than to other 
teachers from across the district or state, all else equal, this teacher 
self-sorting likely will reduce estimates of the relative effectiveness of 
National Board-certified teachers. 


National Board status indicators 


After controlling for the variables described above, the covariates of 
interest will be the set of indicators that summarize a teacher’s status 
with respect to the National Board certification process. The precise 
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set of indicators will differ depending on the research question being 
addressed. 


Methods: Signaling effect 


To test for a signaling effect of National Board certification, we com- 
pare the test scores of students who had one or more National 
Board-certified teachers between the pre- and the posttest with 
scores of students who had no NBCTs between the tests. If National 
Board certification is an effective signal of teaching quality, then stu- 
dents taught by certified teachers should perform better on tests than 
students taught by non-certified teachers. 


We will estimate a model that includes an indicator variable that 
equals 1 if the student had a National Board-certified teacher in the 
tested subject area in any semester (or any year, for CPS students) in 
which we observe the teacher, and 0 otherwise. This model provides a 
comparison of the performance of students who had at least one Na- 
tional Board-certified teacher between the pre- and the posttest with 
the performance of those students who had none. 


Methods: Screening effect 
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To test for a screening effect, we compare the performance of stu- 
dents who had teachers who will ever achieve certification (“ever cer- 
tified”), whether before, during, or after the timeframe of our 
analysis, with the performance of students who had teachers who 
have or will apply but not achieve certification (“never certified”). If 
the National Board certification process is an effective screening de- 
vice for high-quality teachers, then students taught by “ever certified” 
teachers should perform better on tests than students taught by “nev- 
er certified” teachers. 


For the screening model, the teacher status variable will indicate the 
total number of semesters (or years, in the case of CPS students) that 
the student had a National Board-—certified teacher. We will include 
three variables for both the EXPLORE to PLAN and the PLAN to 
ACT analyses: the number of semesters taught by an “ever certified” 
teacher, the number of semesters taught by a “never certified” teach- 
er, and the number of semesters taught by a “never certified- 
withdrawn” teacher. This formulation allows us to distinguish the 


program effects by the amount of instructional contact the student 
had with a particular type of teacher. 


To estimate the screening effect size, we can ask the following ques- 
tion: what would be the effect on a student’s test score if we replaced 
a “never certified” teacher with an “ever certified” teacher? The quan- 
tity we are looking for will be the difference between the coefficient 
on the status indicator for “ever-certified” teachers and the coeffi- 
cient on the status indicator for “never certified” teachers. 


Methods: Human capital effect 


To estimate the effect of the certification process itself on teacher ef- 
fectiveness, we want to compare the student performance of teachers 
who have completed the application process (“past applicant”) with 
the student performance of these same teachers when they were ap- 
plicants (“current applicant”), and with the performance of their 
students before they started the certification process (“future appli- 
cant”). If the National Board certification process itself is effective 
professional development, then we should expect to see a positive 
coefficient on the “past applicant” indicator—implying that students 
of past applicants have higher levels of achievement than students of 
future applicants do. 


Additionally, some previous studies have found evidence that current 
applicants may be less effective than either past or future applicants. 
We can use this model to investigate any such potential effects in our 
sample." 


11. Note that for human capital models, for both Kentucky and Chicago, we 
define the application status variables as spanning one academic year 
(rather than one semester, as is the case with Kentucky signaling and 
screening models). The models therefore include one teacher per stu- 
dent per year in a subject. For Kentucky students who have more than 
one teacher in a school year, we created a special “MULTIPLE” teachers 
category for variables that depend on the identity of the teacher. We 
adopt this approach because of identification concerns with respect to 
the teacher fixed effects model that we use for the human capital effect 
estimates. 
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Teacher fixed effects in human capital models 


In human capital models, we include a set of teacher fixed effects for 
the current teacher in a subject.” The idea behind this approach is 
that our basic model includes teacher fixed effects and the NBC sta- 
tus variables that for an individual teacher will change over time as 
the teacher moves through the application process. Therefore, we 
identify only the effect of going through the process for teachers who 
are changing status during the timeframe in which we observe stu- 
dent data. We are estimating the human capital effect by comparing 
the same teacher with himself or herself over time. 


Results: Signaling effect 
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This section and the next two sections present the results of the statis- 
tical analyses of student test scores to estimate any signaling, screen- 
ing, and human capital effects of the National Board certification 
process. For each analysis, we estimated a series of statistical models 
incorporating a range of different covariates. Results presented for 
model 1, with the fewest controls, include a set of student characteris- 
tics and teacher experience (a proxy variable—the number of years 
the teacher appears in the dataset—in the case of Kentucky). Results 
presented for model 5 include controls for student characteristics, 
teacher experience, teacher incoming students’ average prior test 
score, and a school-level fixed effect. 


We compare the results from model 1 with model 5 in the body of 
the report to provide an estimate of the range of effect sizes, depend- 
ing on the statistical controls included in the model. Complete results 
for all models (1 through 5) are presented in Appendix E. 


12. For EXPLORE to PLAN analyses, the current teacher is the grade 9 
teacher because the PLAN tests are administered in the fall of grade 10. 
For PLAN to ACT, the current teacher is the grade 11 teacher because 
the ACT is administered in the spring of grade 11. To reiterate, for Ken- 
tucky students who have more than one teacher in a school year, we cre- 
ated a special “MULTIPLE” teachers category for that year. In both 
Kentucky and CPS, we include fixed effects for both the 10th and 11th 
grade teachers in the PLAN to ACT analysis. 


To estimate the signaling effect, we compared teachers who currently 
are National Board certified with those who are not. Figure 7 summa- 
rizes our estimates of the signaling effect, by subject. To measure the 
effect size of having an NBCT, the indicator equals 1 if the student 
had a National Board-—certified teacher in any semester or school year 
in the tested subject area between the pre- and posttest, and equals 0 
if the student did not. The coefficient can be interpreted as the effect 
size on the outcome variable (i.e., the number of standard deviations 
of change in the outcome variable) associated with having at least 
one National Board-—certified teacher in that subject between the pre- 
and posttest. 


Figure 7: Estimates of signaling effects of National Board certification. 
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CPS ACT. Significance was calculated using multiple regression models for the effect of having an NBCT in any se- 
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For Kentucky math students, there is a positive and statistically signif- 
icant, although small, effect on both ACT and PLAN scores of having 
at least one NBCT in the subject area between the pre- and posttest. 
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The effect size ranges from a .070 to .122 standard deviation increase 
in both ACT and PLAN math scores. ’” For English on the ACT out 
come only, there is a positive effect of 0.076 in model 1. However, the 
signaling effect is not statistically significant at conventional levels in 
model 5 for English. 


For CPS, results in model 1 are positive and statistically significant for 
all subject areas in both the PLAN and the ACT analysis, with effect 
sizes ranging from .079 in the English PLAN analysis to .304 in the 
science PLAN analysis. When additional control variables are added 
in model 5, statistically significant effects are present only for English 
on the PLAN outcome (effect size of .056), and for math and English 
on the ACT outcome (effect sizes of .077 and .062, respectively). 


Results: Screening effect 
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To estimate the screening effect, we compare student test scores of 
teachers who currently hold or in the future will hold National Board 
certification with test scores of teachers who have applied for certifi- 
cation in the past, or will do so in the future, but who do not achieve 
certification. As mentioned, we measure the screening effect by the 
difference between the coefficient on the status indicator for number 
of semesters/years with an “ever certified” teacher and the coefficient 
on the status indicator for number of semesters (or years) with a 
“never certified” teacher. 


In Figure 8, the effect size should be interpreted as the change in 
score that would be brought about by replacing one “never certified” 
teacher with one certified teacher.’ The results for Kentucky indi- 


13. An effect size of .07 implies that if Kentucky reassigned the median stu- 
dent to an NBCT, the student would move from the 50.0" to the 53.7" 
percentile on the ACT math. 

‘t Appendix E, Table 17 (math), Table 18 (English), and Table 19 (sci- 

ence) present additional results for various specifications of the screening 

model, by subject, outcome, and National Board status variable. To interpret 
the coefficients in these tables, note that the comparison group (the omit- 
ted group) consists primarily of nonparticipating teachers, plus a few partic- 


ipants whose ultimate status we do not observe. 


cate a small but statistically significant screening effect for math on 
the PLAN and the ACT outcomes, with effect sizes ranging from .036 
to .085. This suggests that, in Kentucky, the National Board certifica- 
tion process does screen in math teachers who are slightly more- 
effective compared with those who do not achieve certification. 


In Chicago, the differences between successful and unsuccessful cur- 
rent, future, or past applicants are a mix of nonsignificant and statis- 
tically significant positive effects of the NBC process. In model 1, the 
results are positive and significant in all subject areas for both the 
PLAN and the ACT outcomes (except for English on the ACT out- 
come), with effect sizes ranging from .067 (English on the PLAN out- 
come) to .240 (science on the PLAN outcome). In model 5 there are 
positive effects in English (with effect sizes of .056 on the PLAN out- 
come and .041 on the ACT outcome), and in math on the ACT out- 


come (effect size of .071) bes 


15. As a sensitivity test, we also estimate the screening model by comparing 
applicants who achieved with applicants who did not achieve before they 
ever entered the application process (i.e., when they are pre-applicants). 
This tests for the presence of a screening effect before teachers’ practic- 
es may be influenced by the certification process. In Kentucky, the re- 
sults are similar for almost all models, except that the effect for math on 
the PLAN outcome is no longer statistically significant. In CPS, the ef- 
fects for English are no longer statistically significant for the PLAN and 
ACT outcomes. In addition, the effect for math is no longer statistically 
significant on the ACT outcome. 


DY 


Figure 8: Estimates of screening effects of National Board certification. 
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the regressions. 


Results: Human capital effect 
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To estimate the human capital effect, we compare the same teacher 
with himself or herself over time as the teacher moves from future 
applicant to current applicant to past applicant. The model includes 
National Board status indicators for whether the teacher is currently 
in, or has in the past participated in, the National Board application 
process, along with a current teacher fixed effect, a school-level fixed 
effect, and student characteristics. 


The omitted category is “future applicant,” so the coefficient (“effect 
size”) should be interpreted as the change in outcome score (in 
standard deviations from the national mean) resulting from having a 
teacher who is a current (or past) NBC applicant relative to having 
the same teacher at a stage in her or his career when she or he had 
not yet applied for certification. The coefficients should therefore 


pick up any effect on test scores from teachers who have gone 
through (past applicant), or are going through (current applicant), 
the National Board certification process. The results of all subject ar- 
eas are pooled due to the small number of teachers who change sta- 
tus in the certification process during the timeframe of the analysis. 


Figure 9 summarizes the results. We find little evidence of a human 
capital effect; students of past or current applicants do not, in gen- 
eral, perform differently from students of the same teachers before 
they had applied for National Board certification (future applicants). 
The effect sizes on both the current and past applicant indicator vari- 
ables are small and not statistically significant in Kentucky or Chicago 
for the PLAN and the ACT outcomes. 


Figure 9. Estimates of human capital effects of National Board certification. 
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Conclusions 


Key findings 


In summarizing and evaluating an accumulation of research, the Na- 
tional Academy of Sciences concludes, “The evidence is clear Nation- 
al Board certification distinguishes more effective teachers from less 
effective teachers with respect to student achievement” (National Re- 
search Council, 2008, p. 179). But the extant literature leaves under- 
studied, and unresolved, whether undergoing the certification 
process itself also improves a teacher’s effectiveness. 


The results of this study go beyond estimation of signaling and 
screening effects by examining the evidence of improvement in 
teacher effectiveness as teachers progress through the NBC process 
and ultimately become a National Board-—certified professional. We 
also examine differences in performance growth between successful 
and unsuccessful applicants to determine whether the certification 
process is an effective screening tool. 


Moreover, the signaling literature, which compares outcomes of stu- 
dents with and without NBCTSs, focuses almost exclusively on statisti- 
cal comparisons in just two states, Florida and North Carolina. 
Another contribution of the current study is that it uses data from 
two new locales, the state of Kentucky and the city of Chicago, that 
together include rural, suburban, and large urban districts. Both lo- 
cales are strong supporters of the NBC process, as evidenced by the 
proportions of teachers who hold certification in those locations. The 
Statistical analysis focuses on student outcomes in English, science, 
and math at the high school level. To complement the large-scale 
longitudinal analyses of student achievement, we conducted class- 
room observations to examine changes in the quality of instruction 
over time, comparing applicant teachers as they progressed through 
the NBC process and non-applicants. 


We find that when NBC applicants are observed as they are begin- 
ning the certification process, they already have higher ratings of in- 
structional quality than non-applicants. These results are seen in 
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Limitations 
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teachers’ overall ratings as well as ratings on six of nine or our rubric 
subscales of teaching quality: /esson overview, questioning, content 
knowledge, positive climate, implements instruction, and assesses 
learning. 


However, there are few such changes in instructional quality. There 
are no significant differences from the first observation to the second 
or third observation in the overall rating or the ratings for eight of 
the nine subscales for both NBC applicants and non-applicants. The 
one area where we did find an improvement over time for NBC ap- 
plicants is the subscale for classroom atmosphere. This subscale takes 
into account student involvement, classroom management, and class- 
room culture, reflecting many of the ideals represented in NBPTS’s 
“Architecture of Accomplished Teaching Helix.” 


In the statistical analysis of student outcomes, we find small signaling 
effects of NBC in several subject areas and outcomes. Students who 
have at least one National Board—certified teacher between taking the 
pretest and the posttest tend to score slightly better on the posttest 
compared with students who do not have a certified teacher. We also 
find small screening effects of the NBC process. The certification 
process does seem to identify slightly more effective teachers com- 
pared with those who do not achieve certification. We find little evi- 
dence, however, of a human capital effect of undergoing the NBC 
process for Kentucky or Chicago teachers. Students of past or current 
applicants do not, in general, perform differently on the posttest 
than do students of the same teachers before they had applied for 
National Board certification. 


There are several limitations of the study that should be taken into 
consideration when reflecting on its findings. First, there may be a 
ceiling effect on the growth of National Board applicants. National 
Board teachers start out with higher ratings in instructional quality 
and higher levels of student performance at the beginning of the cer- 
tification process, and so have less room for improvement than do 
non-applicants. 


Second, ours is a relatively small sample of National Board applicants 
in both the classroom observations and the statistical analysis. This 


makes it difficult to detect statistically significant changes over time. 
In the human capital model, for example, we had hoped to measure 
possibly differential changes over time for teachers who achieved, did 
not achieve, and withdrew from NBC during the study period. How- 
ever, there were an insufficient number of each. 


Third, there are limitations with the timing of the classroom observa- 
tions. For NBC participant teachers in the study, it is not possible to 
observe them before they have had any involvement in the National 
Board certification process, because we cannot know teachers’ inten- 
tions until they apply. While we tried to observe teachers as close to 
their becoming applicants as possible, the baseline observations may 
still reflect some exposure to the certification process. 


Some teachers enroll in NBPTS’s preparatory Take One! activity be- 
fore they apply for certification, for example, so they may already be 
making changes to their teaching practices when they are new appli- 
cants. In addition, the last of the classroom observations was con- 
ducted three semesters into the NBC process. At this point, 
applicants typically have completed most certification activities (e.g., 
submitting portfolio entries), but they may not be entirely finished. 
Thus, applicants may continue to change their teaching practices in 
response to what they are learning from certification beyond that last 
observation. This is especially true for applicants who may have been 
unsuccessful on their first attempt at certification and go on to reap- 
ply. In addition, both NBC applicants and non-applicants may have 
been exposed to other types of professional development related to 
instructional strategies (particularly in regards to formative and 
summative assessment), which could influence instructional quality 
during the observation period. 


Last, there are several limitations to the statistical analysis. The data 
collected for analysis included a limited number of characteristics for 
students, teachers, and schools. Our description of the data indicates 
that differences exist between students and schools with NBCTs and 
those without. This suggests there is selection bias in how teachers are 
distributed among and within schools. Our statistical models control 
for some of these differences, but there may be other unobserved fac- 
tors contributing to changes in student outcomes that are also cor- 
rected for with a teacher’s NBC status. 
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Another analytic limitation is the time lag between the pretest and 
the posttest makes it difficult to attribute changes in student 
achievement to an individual teacher. Our statistical models take into 
account the effect of all the teachers for that specific subject that stu- 
dents had between the two tests, but this may diminish the impact of 
any one teacher. If a student had a high-quality teacher in the pretest 
year and a low-quality teacher in the posttest year, we might expect 
the student’s growth to be lower than if the student had the high- 
quality teacher for the full duration. 


Implications for future research 
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The results of our study have several important implications for fu- 
ture research on National Board certification. Our analysis of class- 
room observations suggests that it is important for studies to look 
beyond traditional outcomes of student achievement (test scores), 
and also consider teacher outcomes such as quality of instruction. 
While we found that National Board applicants demonstrate im- 
provement in classroom atmosphere over time, we need more con- 
text for understanding what caused these changes in classroom 
atmosphere. Surveys or interviews could be conducted with National 
Board applicants to better understand how and why they may have 
changed their classroom practices as a result of participating in the 
certification process. Future research could also examine portfolios 
of students’ work for teachers before and after participating in the 
certification process. This would provide evidence of whether NBCTs 
are more effective at challenging students and at creating individual- 
ized assignments based on where students are at—skills that are em- 
phasized in NBC. 


The statistical analysis of student outcomes could be extended by ob- 
taining a larger sample, by adding either more years of data or addi- 
tional locales. This would allow for a more nuanced examination at 
various stages of the certification process. For example, do outcomes 
differ for applicants who achieve certification upon their first attempt 
compared with applicants who achieve it after two or more applica- 
tion cycles? With a larger sample, it would also be possible to examine 
differences by certification type (e.g., math teachers with Generalist 
certifications compared with teachers with Mathematics/Adolescence 
and Young Adulthood certifications). 


Implications for practice 


Large investments have been made in the development of the Na- 
tional Board for Professional Teaching Standards certification pro- 
gram. As of September 2005, the National Science Foundation and 
the U.S. Department of Education had appropriated more than $149 
million dollars to it, and nongovernment funders had spent an addi- 
tional $261 million (Cohen & Rice, 2005). Additionally, there are on- 
going costs incurred by applicants or (more typically) their 
sponsoring school systems. 


As a result of these investments, there is a great deal of interest in 
identifying and measuring the full value to education systems of en- 
couraging teachers to obtain National Board certification. The “sig- 
naling” value of certification has been demonstrated, and the long- 
term benefits to improvements in the workforce have been postulat- 
ed, but there also is interest in measurement of more immediate ef- 
fects of certification on the instructional effectiveness of participants 
in the program. 


Although its findings are modest, this study contributes to better un- 
derstanding the full benefits of encouraging National Board certifica- 
tion, which may inform future budget decisions by districts or state 
departments of education about subsidization of the NBC process. 
Although the cost of the NBC program has been considerable, in fact 
it is much less expensive than raising teacher salaries enough across 
the board to make up for years of salary declines (in real terms and 
relative to other professions requiring similar skills) that may have 
weakened the quality of new entrants to the profession and the teach- 
ing workforce generally (e.g., Burke et al., 2004). 


Given that the National Board certification process has repeatedly 
demonstrated the ability to distinguish between more- and _less- 
effective teachers, school systems should think about how to make 
good use of this tool. For example, school systems could use National 
Board certification as a gatekeeper for tenure, implemented at a later 
point in the teaching career path than the criteria most school sys- 
tems currently use for those decisions. School systems could also link 
certification to compensation. Over time, pay differences would be 
expected to encourage certified teachers to stay in teaching, and un- 
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successful applicants to leave, creating openings for new, more prom- 
ising entrants. 


Appendix A. Leadership by Design 


Science classroom observation instrument 


SCIENCE CLASSROOM OBSERVATION INSTRUMENT 
Lc 


LeveliClass__LessonTitle = = ~—=—s_—s_—_—_—s*rTotal# Students __—Ss—s« Gender #: M___F 
# Minority. # inclusion. Length of Observation 


Learning Objective of the Lesson. 


|. LESSON OVERVIEW 


A. Learning Objective of the Lesson (Mark all that a 

DB Clearly communicated by the teacher using multiplemeans [J] Communicated orally onty J) Communicated in writing 
onty G Student activities consistent with the lesson objective(s) [J Student activities not consistent with the lesson 
objective(s) Lesson objective communicated but not clear J Lesson objective not communicated 


B. Major Instructional Resource used in the lesson observed - Mark with a “1° the primary/predominant 

resource influencing instruction and following numbers (2, 3, etc.) if more than one observed - Sections B, D, E. 

© Textbook GJ Other Print Materials (worksheet, manual,etc.) [J TechnologyBased Presentation Media 

ee Point, Smart Board, bee A VCR, Overhead Projector] Hands-on/manipulative materials (laboratory materials) 
Calculators J) Computer GraphingCaiculator [J Technology Probes [J Centers [J NoneofAbove 


C. Content Delivery ’ D. Place in Instructional _ E. Seating Arrangement for Lesson 
(Mark all that apply) Sequence i 
LW Age/gradelevel appropriate ~~ “LE Introduction ofnew concept WW Whole group 


Ly Content presented is accurate LJ Develop conceptual understanding LJ Small groups working on same task 

LJ One or more content errors Ud Apply conceptto new situation UW Small groups working on different task | 

UW Student misconception nat LI Review concept or procedure ' Lo Individuals working on same task } 
corrected : LD Assess studentunderstanding a Individuals working on diferenttasks — 

Il. INSTRUCTIONAL OVERVIEW - Mark with a “1° the primary/predominant resource influencing instruction 

and following numbers (2, 3, etc.)if more than one observed — Sections A& B. 


A. Instructional Strategy 

Gi Teachertecture [J Teacher demonstration Teacher-led discussion (J individual assistance 
D Student presentation [J Small group discussion [J Student investigation [J Student Experiment 
DI Using a Model to Teach a Science Concept 


B. Student Activity 
Ht Listening to/observingteacher presentation [J] Participatingin discussion (teacher led or small group) 

Conducting investigation [J] Conducting student or teacher initiated experiment [J Print-based Activity: Reading, 
answering questions Working onwritten assignment (science notebook, writing a labreport, etc.) [J Taking atest 
Gl Using education software program [J Using technology forresearch [J Using computer for inputting/analyzing data 
Gi Student Presentation and/or Listening to Other Student Presentation [J DevelopingUsing a model to learn or clarifya 
concept 


Ill, QUESTIONING 


A. Quality of the Questions (Mark only one) 
Questions were mostly convergent focusing on factual recall 
Questions were mostly divergert and stimulated broad student responses 
0 Appropriate balance of divergent and convergent questions 
No questions were asked by teacher or posed through the activity being conducted 


B. Questioning Techniques (Check all that apply) 

© Students are encouraged to ask questions of each other and/or teacher, EJ Questions stimulated higher level and 
divergentthinking (J Appropriate waittime All students have an opportunity to respond (J Most students have an 
opportunty to respond [J Only afew students have an opportunityto respond [J Teacher provides appropriate feedback 
to student responses 
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IV. CLASSROOM ATMOSPHERE (Mark one response in each section A and B) 


A. Student Involvement (Check only one) 

EC} All or nearly all students demonstrate interest and were engaged 

C] Majority of students demonstrate interest, were engaged 

J Approximately equal numbers of students interested/engaged and not interested/not engaged 
[J Majority of students uninterested or apathetic; generally not engaged 

C] Nearly all ofthe students were uninterested and not engaged 

B. Classroom Management (Check only one) 

a Classroom orderly, no student disruptions which impaired learning environmert 


Classroom generally orderly but some student disruptions which required corrective action 
Classroom disorderly, frequent student disruptions which seriously impaired the learning environment 


Classroom Culture (Check all that apply) 
J] Curiosity [J Cooperation withteacher and/or other students [J Persistence [J Responsibility 
[} Confidence in ability to “do” science [J Enthusiasm for learning [J Objectivityin analyzingdata [J Accuracy 
=] Use of critical thinking skills 


A. Amount of Student Investigation/Research (Mark only one box for section A) 
@ Students are engaged in an investigation/research which may include skils 1-7, however the emphasis is onhigher level 
skills 8-16. 
J Students are engaged in investigatior/research inwhich the focus of lesson is onthe basic process skills 1-7. 
J Students are notinvolved inany type of investigation/res earch involving hands-on or laboratory activity. 


A. Level of Student Investigation/Research (Mark only one box for Section B) 
[] Students design and carry out an experiment to solve a problem intiated by a teacher or student question. 
] Students are investigating a science phenomenon using a preplanned activity which requires the collection andanalysis 
of data to solve a problem or create a product. 
Students are investigating a science phenomenonusinga preplanned activity which provides a definitive procedure and 
requires a specificresponseto be correct. Does not necessarily involve collection and analysis of data. 
© Students are not involved in any type of investigation/res earch involving hands-on or laboratory activity 


C. Scientific Skills Being Developed (Check all skills which are introduced and/or developed in the 
observed lesson) 
Basic Skills (Mark all that are observed) 
EL} 1. Observing GJ 2. Measuringf] 3. Classifying 4. Inferring] 5. Predicting 6. Communicating 
J 7. Investigating (Basic Level) 


a ee eee 
E] &. investigating (involves Analysis ofData) [J 9. DesigningExperiments[] 10. Formulating Hypotheses 
[J 11. Conducting Experiment DB 12. BAD. o? 13. Interpreting Data 
D] 14. Forming Conclusions [J 15. EvaluatingData [J 16. Interpretive Discussion 


Vi: TEACHER DEMONSTRATES APPLIED CONTENT KNOWLEDGE (Mark one response for 
each section) 


A. Communication 

EJ] Consistently used accurate and effective communication; vocabulary is clear, comect and appropriate. 
CL] Generally used accurate and effective communication; occasional use of inappropriate vocabulary. 

[J Consistently used inaccurate and ineffective communication and/or inappropnate vocabulary. 


B. Connects Content to Life Experiences (Mark one response in this section) 
[} Consistently connected most content/procedures/activities with relevant life experiences. 
L} Connected some content/procedures/activities with relevant life experiences. 

[] Rarely or never connected content/procedures/activties with relevant life experiences. 
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c. Instructional Strategies Appropriate for Content and Contribute to Student Learning 

EJ Used instructional strategies that were clearly appropriate for the content/processes of thelesson. 

[} Used instructional strategies that were generally appropriate for the content/processes of thelesson. 

[J Used instructional strategies that were questionable or inappropriate for the content/processes ofthelesson. 


D. Guides Students to Understand Lesson Content from Various Perspectives to Extend Understanding 
[) Provided multple opportunities for students to consider content from a different context or perspective. 

C} Provided a single opportunity for students to consider content from a different context or perspective. 

[J] Never provided an opportunity for students to consider content from different context or perspective. 


Vil: TEACHER CREATES AND MAINTAINS LEARNING CLIMATE (Mark one response for each 


section) 


. Communicates High 

ini Stpntnonredtatenergtereen supeebies teacher consistently communicates confidence in students’ ability to achieve. 
Challenging objectives; some communication of confidence in students’ ability to achieve. 

Minimal objectives for students; rarely or never communicates confidence in students’ ability to achieve. 

6B. Establishes Learning Environment 


C} Clear conduct standards; awareness of student behavior, responded appropriately/respectfully. 
LC} Conduct standards but some inconsistencyin monitoring and response to student behavior. 
LJ No established conduct expectations; minimal or no montorng, inappropriate responses to behavior. 


C. Values and Supports Student Diversity 
Recognized and consistently respondedto the diversity in the class (gender, ethnicity, academic and physical abilities), 
Consistently used or attempted to use strategiesto address the needs of all students; 
DD Recognized but inconsistently responded to the student diversity; used or attempted to use some different strategies to 
address theneeds of different students 
Little or no recognition or response to student diversity and individual needs; used the same approach for all students. 


D. Fosters Mutual Respect Between Teacher and Students and Among Students 

[} Always treated all students with respect; encouraged andclearly expected students to treat each other with respect. 

C) Generally treated students with respect; some encouragement of students to treat each other with respect. 

[J Did not show respect or concern for students; little or no encouragement of students to treat each other with 
respect. 


E. Provides a Safe Environment for Learning 

EJ Classroom environment was emotionally and physically safe for students at all times. 

[} Classroom environment was emotionally and physically safe for students most of the time. 
[J Classroom environment was not emotionally and/or physically safe for students. 


Vill. TEACHER IMPLEMENTS AND MANAGES INSTRUCTION (Mark one response for each 
section) 

A. implements Instruction Based on Student Needs and Assessment Data 

DI instruction addressed individual student needs; always used or attempted to use an appropriate instructional strategy to 
meet individual student needs; adapted instruction to changingor unanticipated circumstances. 

DD instruction addressed most individual student needs; used more than one strategy as needed; sometimes adapted 
instructionto meet changing or unanticipated circumstances. 

DD instruction didnot address individual student needs; one strategy was used for all students; no attemptto adaptlesson to 
meet changing orunanticpated circumstances. 


8B. Uses Time Effectively 
Always used efficient procedures for non-instructional tasks (handling materials/supplies, managing 
transitions, organizing work, etc.) so there is minimal loss of learning time. 
[} inconsistently used efficient procedures for non-instructional tasks causing some loss oflearningtime. 
LJ Used inefficient procedures for non-instructional tasks resulting in significant loss of learningtime. 


C. Uses Space and Materials 
CE} Consistently used classroom space and materials effectively to faciifate student learning. 
Classroom space and/or materials were not always used effectively to facilitate student learning. 


D} ineffective use of classroom space and materials to facilitate student learning. 
Copyright: Briarwood Enterprises LLC. 
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D. Implements and Manages Instruction to Facilitate Higher Order Thinking 
Mostinstruction encouraged higher order thinking of all students. 
Some instruction encouraged some higher order thinking by most students. 
Little or no instruction encouraged higher order thinking by any students. 


IX. TEACHER ASSESSES AND COMMUNICATES LEARNING RESULTS (Mark one Response 
For Each Section) 


A. Uses Formative Assessments Aligned with Learning Objectives 

[} Formative assessment strategies fully aligned with learning objectives; obviously used to adjust instruction. 

LJ Formative assessment strategies aligned with learning objectives; appearedto be used to adjust instruction. 

LJ Formative assessment strategies were generally aligned with learning objectives; not clear if or how used to adjust 
instruction. 

DB Formative assessmentto support student learning not clearly alignedwith objectives; appeared to be done without 
intention or done for compliance. 

No assessment strategies used even though formative assessment was needed to determine level of student learning. 


B. Uses a Variety of Formative and/or Summative Assessments to Measure Student Learning 

Used assessment strategies which providedall students several opportunities to demonstrate learning. 

[}] Used assessment strategies which provided most students opportunities to demonstrate learning. 

[] Used some assessment stategies which provided some students opportunties to demonstrate learning. 

] Limited use of assessment strategies which provided minimal opportunities for students to demonstrate learning. 

J No assessment strategies used even though formative assessment was needed to determine level of student learning. 


Formative and/or Summative Assessments to Accommodate Diverse Learning Needs and Situations. 

J Assessment strategies were obviously adapted to accommodate student diversity and diverse learning needs. 

J] Assessment strategies appeared to be adapted to accommodate student diversity and diverse learningneeds. 

[J Some attempts to adapt assessment strategies to meet diverse needs however not successful for all students. 

[} Limited attempt to adapt assessment strategies to accommodate student diversity or diverse student needs. 

No assessment strategies used even though formative assessment was needed to determine level of student learning. 


X. OVERALL CLASSROOM RATING PROFILE (Mark only one) 


DD instruction was effective for ali students; evidence that instruction based on clearly defined objectives fully aligned 
with standards: all students engagedin activities requiring higher level thinking skills. 

Instruction was effective for most students; evidence that instruction based on clearly defined objectives aligned with 
standards; most students engagedin activities which required higher level thinking skills. 

D instruction was somewhat effective for most students; evidence that instruction was based on student objectives 
somewhat alignedwith standards; some opportunity for higherlevel thinking skills development. 

instruction observed was of poor-mediocre quality and effective for only a portion of the students; little evidence that 
instruction was based on student objectives; instruction had a minimal impact onlearning. 

Instructionwas of poor quality and was not effective for any students; no evidence that instruction was based on 
student objectives; learning was not based on instruction provided. 


TO IDENTIFY INSTRUCTIONAL ENVIRONMENT CONTEXT ONLY 
PHYSICAL SETTING/CLASSROOM ENVIRONMENT (Mark all that Apply in sections A, B, a) 


| A. Classroom Facilitates Student Learning = __| ©. Classroom Environment — q 
' Wy Student seating is flexible to allow for ae needs “i Science Materials Equipment evident 
i (projects, experimentation, cooperative groups, etc.) } Science displays promote learning 
{ hi Needed utilities are available (water, electricity. etc.) | Eh Science reference books available 
Flat top surfaces are sufficient for experimentation, LD Student textbooks evident 
projects, displays, etc Computers availablefor studentuse#__ 
Classroom Facility — - LW Ongoing science projects in evidence 
Classroom adequate size for student number i Student work displayed 
Adequate storage for resources/materials/equipment ~——”—“ Lf Living materials presert(accordingto 
Furnishings allow for activty-basedinstruction «sss | school policyandwhen appropriate) 
a 
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Mathematics classroom observation instrument 


MATHEMATICS CLASSROOM OBSERVATION INSTRUMENT 
Leadership by Design (NBC Version) 
Briarwood Enterprises LLC. 


Copyright: 
LevelClass __ Lesson Title Total # Students Gender: M. F 
# Minority # Inclusion Length of Observation 
Learning Objective of the Lesson 


I. N : 
A. Learning Objective of the Lesson (Mark all that apply) 

Clearly communicated by the teacher using multiple means [] Communicated orally only []] Communicated in 
writing only [J Student activities consistent with the lesson objective(s) [J Student activities not consistent with the 
lesson objective(s) [J Lesson objective communicated but not clear [J Lesson objective not communicated 


B. Major Instructional Resource used in the lesson observed (Mark 1, 2, 3...with I meaning“primarypredominant 
resource in instruction”) ) 
D Textbook Other Print Materials (worksheet, manual, etc) (J pocgprcen. A based presentation media (SMART 
Board, Power Point, etc.) [J Document Camera [J ManipulativesHands-on materials 
a Calculators (Other than Graphing) [J Compurer ne Calculator [J Mathematics Centers 

Technology Probes [J None of the above 


_C.L Content Delivery | D. Place in Instructional Sequence _ 
Mark 1, 2, 3..) 


| Spey j 
CTIONAL OVERVIEW (Mark I, 2, 3... dn caik ncnton wik Tenonotag “Sorkaadpenlaabaais aaah 

influenang instruction”) 
A. Instructional Strategy 
(Bi Teacher lecture [J Teacher demonstration [J Teacher-led discussion [J Individual assistance 
Gi Student presentation [J Small group discussion [J Students Solving Problems [J Other 
B. Student Activity 

Listening to/observing teacher presentation [J Participatmg m discussion (teacher led or small group) 

Conducting mathematics investigation [J Completing a skillsipractice worksheet (recall or comprehension) 
OF Higher-level problem-solving assignment [J Uniag heode-tee metals to solve problems’ verify solutions 
G Applying math to realistic problems (J Assignmentanswering questions from text other resources 

Taking test ne Us ing computer software program [J Using the Intemet for research 
2 —— for mputtmg/analyzmg data 

ommen 


TIL. QUESTIONING 
A. Quality of Questions (Mark ONLY ONE box, record examples of each) 
Questions were mostly narrow or convergent focusing on factual recall or one word responses 
(1) Questions were mostly broad or divergent and stimulated higher cognitive student responses 
Appropriate balance of factual recall and higher cognitive questions 
No questions were asked by teacher or posed through the activity bemg conducted 


B. Questioning Techniques (Mark all that Apply) 

GI Students are encouraged to ask questions of each other and/or the teacher [J Questions stimulated higher level and 
divergent thnkmg [J wait tme [J All students have an opportunity to respond* [J Most students have an 
opportunity to respond* Only a few students have an opportunity to respond* [J Teacher provides focused, 
descriptive, and qualitative feedback to student responses* 
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Student Involvement (Mark only one) 

C} All or nearly all students demonstrate mterest and were engaged 

CJ Majority of students demonstrate mterest, were engaged 

L} Approxmately equal numbers of students mterested/engaged and not mterestednot engaged 
J Majority of students unmterested or apathetic; generally not engaged 

J Nearly all of the students were unmterested and not engaged 


Classroom Management (Mark only one) 

CO pesonienae. mtrerar dpe. cep neler mand andy wget on any 

L sroom generally orderly but some student disruptions that required disciplmary action 

im Peessoral arn frequent student disruptions that seriously mpaired the leaming environment 


C. Classroom Culture/Learner Attitudes Demonstrated (Mark all that apply) 
DJ Curiosity [J Cooperation with teacher and/or other students [JJ Persisicnce [J t 
[J Confidence m ability to “do” math [J Enthusiasm for leamimg math [J Accuracy Use of critical thinking skills 


A. huseeatel Problem Solving/Student Investigation Research (Mark ONLY ONE) 

( Students are engaged im a mathematics problem solvimg/mquiry experience which may melude skills 1-7, however the 
emphasis 1s on higher level skills 8-17. 

Students are engaged m problem/mquiry based activity m which the focus of lesson is on the lower level skills 1-7. 

[J] Students are not mvolved m any type of problem-solvmg mquiry mvestigative activity.(]f marked alo mark 3B, 4th box) 


B. Level of Student Engagement in Problem Solving/Investigation/Research (Refers back to Part A)(Mark ONLY ONE) 
CJ] Students solve meanmgful mathematical or realistic problems through explorations or mvestigations that can be 
generalized to allow them to make valid conjectures (#14), determme strategies to solve problems (#13), evaluate logical 
consistency (#15) and/or justify'verify solutions (#16). 
Students discover a mathematics phenomenon using a planned activity that requires —— strategy, 
i mg connections between mathematics ideas or 


collecting 

Students leam a mathematics concept using a preplanned sip hat asalieve die aeeiioe ai 
requires a specific response to be correct. 

Students are not mvolved m any type of problem solving ‘mquiry mvestigative activity. 


C. Mathematical Skills Being Developed (Mark all skills which are introduced/developed in the observed lesson) 

Basic Skills: (Mark all that are observed) 

Pye ranpreerre. 2. Reciting/recallmg facts (J 3.Classifymg [J 4. Measuring’estimating 
5. Coordinate Graphing 6. Constructing charts'graphs [J] 7. Computing’calculating 


: (Mark all that are observed) 

collecting recording data [J 9. Interpreting/analyzing datw/statistics 10. Investigating (Hands-on, Tech) 
11 Applying Theorems'principles J 12. Evaluating the Relevancy of data 

13. Determming problem solving y 14. Creating/formulating pattem 

15. Evaluating logical consistency 16. Justifymg/verifymg solutions “oO I _ oe Discussion 


(Mark one response for each section) 


[J] Consistently used accurate and effective communication; vocabulary is clear, correct and appropriate. 
LC} Generally used accurate and effective communication; occasional use of mappropriate vocabulary. 

J Consistently used maccurate and meffective communication and/or mappropniate vocabulary. 
Connects Content to Life Experiences 

CE} Consistently connected most content/procedures/activities with relevant life experiences. 

_] Connected some content/procedures/activities with relevant life experiences. 

J Rarely or never connected content/procedures/activities with relevant life expenences. 


Copyright: 
= i Used uonectendl extn Appropriate for Content and Contribute to Student Learning 


 fPateipetels phy cemonen piuere liyttantel yccmity ania aremprerorel ict 


VI. TEACHER CREATES AND MAINTAINS LEARNING CLIMATE (Mark one response for each section) 

= Communicates High Expectations 
[} Significant’challengmg lesson objectives; teacher consistently communicates confidence m students’ ability to achieve. 
EC} Challengmg objectives; some communication of confidence m students’ ability to achieve. 
[J Mmunal objectives for students; rarely or never communicates confidence m students’ ability to achieve. 


B. Establishes a Positive Learning Environment 
(1) Clear conduct standards; awareness of student behavior, responded appropriately respectfully. 
[] Conduct standards but some mconsistency im monitoring and response to student behavior. 
] No established conduct expectations; mimunal or no monitormg; mappropriate responses to behavior. 


Cc. Values and Supports Student Diversity 
——. and consistently responded to the diversity m the class (gender, —— academic and physical abilities); 


Little or no recognition or response to student diversity and mdividual needs; used the same approach for all students. 


D. Fosters Mutual Respect Between Teacher and Students and Among Students 

[) Always treated all students with respect, encouraged and clearly expected students to treat each other with respect. 
Generally treated students with respect, some encouragement of students to treat each other with respect. 

Did not show respect or concern for students; little or no encouragement of students to treat each other with respect. 


- Provides a Safe Environment for Learning 
(} Classroom environment was emotionally and physically safe for students at all tumes. 
[J Classroom environment was emotionally and physically safe for students most of the time. 
[} Classroom environment was not emotionally and/or physically safe for students. 


Vill. TEACHER IMPLEMENTS AND MANAGES INSTRUCTION (Mark one response for each section) 

A. Implements Instruction Based on Student Needs and Assessment Data 

Gl Instruction addressed mdividual student needs; always used or attempted to use an appropriate mstructional strategy to 
meet mdividual student needs; adapted mstruction to changmg or unanticipated circumstances. 

[Bi Instruction addressed most mdividual student needs; used more than one strategy as needed; sometumes adapted 
imstruction to meet changing or unanticipated circumstances. 

GD Instruction did not address mdividual student needs; one strategy was used for all students; no attempt to adapt lesson to 

meet changimg or unanticipated circumstances. 

B. Uses Time Effectively 
transitions, ha sriepe Retake wary bien soo loss of leaming time. 

a y used efficient procedures for non-mstructional tasks causmg some loss of leammg time. 

[J Used mefficsent procedures for non-mstructional tasks resulting m significant loss of leammg time. 


} Classroom space andior materials were not always used effectively to facilitate student leammg. 


] Ineffective use of classroom space and matenals to facilitate student leammg. 
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litiMocam tenia 
mstruct thmkmng by most students. 
J Linde or no mstruction posokenisd eaporsalcrrn. by any students. 


AN [S (Mark one response each section) 
<dess Pecmuatine Aaseomsass Alavad wits Louis Oeberteet 
C}] Formative assessment strategies fully aligned with leammg objectives; obviously used to adjust mstruction. 
SI Recmahte sesueenanel euutetier ened eteibemnlag savomter sapiags & te aed te en Geomeoion 
C] Formative assessment strategies were generally aligned with leammg objectives; not clear tf or how used to adjust 


mstruction. 
; Formative assessment to support student leammg not clearly aligned with objectives; appeared to be done without 
mtention or done for 
a] No assessment strategies used even though formative assessment was needed to determine level of student leammg. 


Uses a Variety of Formative and/or Summative Assessments to Measure Student Learning 

EJ] Used assessment strategies which provided all students several opportunites to demonstrate leammg. 

_] Used assessment strategies which provided most students opportunities to demonstrate leammg. 
Used some assessment strategies which provided some students opportunities to demonstrate 

_] Lamited use of assessment strategies which provided mmimal opportunities iets ts aliens Biel 
No assessment strategies used even though formative assessment was needed to determme level of student leamme. 


Adapts Formative and/or Summative Assessments to Accommodate Diverse Learning Needs and Situations. 
[J] Assessment strategies were obviously adapted to accommodate student diversity and diverse leaming needs. 
[] Assessment strategies appeared to be adapted to accommodate student diversity and diverse leammg needs. 
[] Some attempts to adapt assessment strategies to meet diverse needs however not successful for all students. 
[} Lunited attempt to adapt assessment strategies to accommodate student diversity or diverse student needs. 
[] No assessment strategies used even though formative assessment was needed to determme level of student leaming. 


J . (Mark only one) 

PS cae genera per evidence that mstruction based on clearly defined objectives fully aligned 
with standards; all students engaged m activities requirmg higher level thmkmmg skills. 

OD instruction was effective for most students; evidence that mstruction based on clearly defied objectives aligned with 
standards; most students engaged m activity that required or offered opportunity to develop higher level thmking skills. 

o beunnr yagbtnde per mipedireei for most students; evidence that mstruction was based on student objectives 

somewhat aligned with standards; little opportunity for higher level thmking skills development. 

G Instruction observed was of mediocre quality and effective for only a portion of the students; little evidence that 
imstruction was based on student objectives; mstruction had a mmunal mpact on leammg. 

Gi instruction was of poor quality and was not effective for any students; no evidence that mstruction was based on 
student objectives; leammg was not based on mstruction provided. 


LA. Classroom Facilitates Student Learn PRY 
pocds 

| (individual work, cooperative groups, ek.) i 

YL Adequate clecirical and any other needed uiilities availabe” 

> ED Pat top surfaces are sufficient for working with hands-on 

| materials of problems, projects, models, displays, che 

_B. Classroom Facility 


Ly Adequate storage for resources/materialscqui pme 
; D) Hurnishings allow for activity-based instruction cit. ek) 
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Appendix B: Rubric for scoring classroom ob- 
servations 


Science rubric 


Scoring Rubric — Science Classroom Observations 


Scores of 5-1 reflect the perceived status of instruction; NO = Not observed but could have contributed to the lesson; NA = Not ob 
and not applicable for meeting lesson objective 
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Objectives forthe lesson No particular objective 
were appropriate but not 
communicated in at least one |fully communicated and not 
way; student activities were [readily apparent to the 
students; student activities 
were generally consistent 


and of interest to nearly all of 
ithe students. 


The lesson was well 
designed to achieve the 
lesson objectives; 
appropriate highly effective 


78 


79 


80 


lemotionally and physically 
safe for students most of the 
time. 
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and a 
Instruction addressed all 
individual student needs; 
always used or attempted 
ito use a variety of 
appropriate instructional 
strategies to meet 
individual student needs; 
adapted instruction to 
changing or unanticipated 
circumstances 


Instruction addressed most 
individual student needs; 
used different instructional 
strategies as needed to meet 
needs of most students; 
sometimes adapted 
instruction to meet changing 
or unanticipated 
circumstances 


Instruction addressed many 
individual student needs; 
used more than one 
instructional strategy as 
needed; occasionally 
adapted instruction to meet 
changing or unanticipated 
circumstances 


Instruction addressed 
Isome individual student 
needs; attempted to use 
more than one 
instructional strategy; 


to meet changing or 
unanticipated 
circumstances 


seldom adapted instruction |meet changing or 


Instruction did not 
address individual student 
needs; one strategy was 
used for all students; no 
attempt to adapt lesson to 


Always used efficient 
procedures for non- 
instructional tasks; no loss 
of learning time was 
observed; classroom space 
and materials were always 
used effectively to facilitate 
student learning. 


Used efficient procedures for 
non-instructional tasks most 
of the time; minimal loss of 
learning time was observed; 
classroom space and 
materials were used 
effectively to facilitate student 


learning. 


Generally used efficient Used both efficient and 
procedures for non- inefficient procedures for 
instructional tasks with some|non-instructional tasks 
loss of learning time; resulting in significant loss 
classroom space and 
materials were used 
effectively most of the time. 


Ispace and/or materials 
lwere used effectively to 
facilitate student learning 
some of the time. 


of learning time; classroom|learning time; classroom 


resulting in major loss of 


space and materials were 
not used effectively to 
facilitate student learning. 


Instruction encouraged 
higher order thinking by all 
students; included 
significant amount of 
independent and/or group 
processing and reflection 
time. 


Instruction encouraged 
higher order thinking by most 
students; included some 
independent or group 
processing and reflection 
time. 


Instruction encouraged 
higher order thinking by 
some students; included 
minimal independent or 
group processing and 
reflection time. 


Instruction encouraged 
higher order thinking by 
only a few students; little, i 
any, independent or group 
processing or reflection 
time was provided. 


Instruction was minimal 
and ineffective; did not 
lencourage higher order 
thinking by any students; 
did not include any 
independent/group 
processing or reflection 
time 


strategies were fully 
aligned with learning 
objectives; assessment 


strategies were aligned with 
learning objectives; appeared 
to be used to adjust 


Used vanous formative 


Formative and/or summative 


assessment strategies 


appeared to be adapted to 
accommodate student 


diversity and diverse learning 


[how assessment results 


strategies not clearly 
aligned with learning 


strategies were generally 
aligned with learning 
objectives; not clear if or 


strategies that provided 
many students (at least the 
majority) opportunities to 
demonstrate learning. 


Some attempts were made 
to adapt formative and/or 
summative assessment 
strategies to meet diverse 
needs however, these were 
inot successful for all 
students. 


lwere used even though 
formative assessment 


objectives; appeared to be |was needed to determine 
done without intention or _ 


the level ofstudent 


Mathematics rubric 


Scoring Rubric —- Mathematics Classroom Observations 
Scores of 5-1 reflect the perceived status of instruction; Score of NO = Not observed but could have contributed to the lesson; 


Score of NA = Not observed and not appropriate for meeting the lesson objective 
Copyright: Briarwood Enterprises LLC. 


for and of interest to half or 
more of the students. 
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solving or were not 
involved in any type of 
problem solvir 1g/ 


however, the focus of the 
instruction was on lower 
level skills during most of 
the period; interpretive 
discussion was minimal or 


engaged in an authentic problem solving experience and were required to analyze and interpret real-world data. They were actively involved in 
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C. Values and Recognized and 

Supports Student | consistently responded to 
the diversity in the class, 
consistently used or 
attempted to use strategies 
to address the needs of all 
students. 


Used instructional 

strategies that were 

generally appropriate for 
if 


Recognized but 
inconsistently responded 
to the diversity in the 
class; used or attempted 


effectively most of the 
time. 


students. 
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Flexible student furnishings can 
accommodate any type of mathematics 
activity to provide for maximum student 
and/or teacher interactions. 


Classroom is large with sufficient 
storage for supplies; materials are well 
peal for ease of access, 
classroom furnishings are appropriate 
for problem solving and/or hands-on 
activities and materials are available for 


mecbobic tate a acteshoconeation, eg, 
graphing calculators, some mathematics 


Student furnishings are not flexible and in 
many cases limit the interactions needed for 
quality mathematics instruction. 


Classroom ts inadequate in size with little or no 


storage; little, if any, organization of materials 


or extremely limited; no evidence of 
mathematics displays and no student work was 
posted. 


Appendix C. A pilot test of the Leadership by 
Design scoring rubric for assessment of in- 
structional quality 


Linda Cavalluzzo, Stephen Henderson, and Christine Mokher 
January 2010 


Overview 


This is a report on a teacher observation pilot conducted for the study “The Contribution 
to Teacher Effectiveness of National Board Certification”, which will examine the impact of 
on teaching effectiveness of going through the National Board Certification (NBC) applica- 
tion process. One aspect of the evaluation involves a comparison of classroom observations 
from a sample of teacher NBC candidates with similar teachers not pursuing this certifica- 
tion. The goal of this part of the study is to chart the observed use of effective instructional 
practices as teachers move through the NBC process as compared to non-NBC applicant 
teachers with similar characteristics in similar classroom settings. Changes in instructional 
quality will be examined for science teachers in 34 schools in Kentucky (17) and Chicago 
(17) over a three-year period. Growth in instructional quality for NB-involved teachers will 
be compared to teachers who are not involved with the NB process to draw conclusions 
about the gains in instructional quality made by science teachers as a result of participation 
in the certification process. 


This study design requires the use of a comprehensive observation instrument to document 
what is observed, a tool for assigning numeric scores to the instructional practices observed, 
and consistent and reliable data collection and scoring procedures to maintain the internal 
validity of these data. The Leadership by Design (LBD) Science Classroom Observation In- 
strument, modified to ensure consistency with the NB science standards, has been selected 
for use in the study. This instrument has been widely used in Kentucky and elsewhere; class- 
room observation data have been collected using the LBD instrument for over 3,000 teach- 
ers in more than 250 elementary, middle, and high schools in 7 different states. Projects 
utilizing the LBD include work funded by the U.S. Department of Education and the Na- 
tional Science Foundation. The LBD has also been adopted by the National Science Teach- 
ers Association as a program improvement tool to help assess and improve the quality of 
instruction in middle school and high school classrooms. 


In contrast to the extensively used LBD, the instrument for assigning numeric scores to the 
observation data —the LBD Classroom Observation Rubric—was newly developed for this 

study. Thus, we have conducted pilot observations for a small sample of science teachers to 
identify any problems transferring the observation data to the rubric and to ensure that the 
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scoring data are internally consistent. We are using these data to identify and address any 
issues with the rubric itself or with the procedures of translating the LBD instrument data 
into our scores on the rubric prior to conducting the observations for the study. In the ac- 
tual study, the scoring of instruction will be based on classroom observations and support- 
ing information obtained from the teacher debrief interviews and a review of lesson plans 
and sample assessments. 


For this pilot study, the developer of LBD, Dr. Henderson, trained five observers in use of 
the LBD and scoring rubric. As described in more detail in the appendix [Table 11] to this 
report, the five observers are all experienced science educators who also have used the LBD 
instrument for previous studies. The observers did not collect additional materials or de- 
brief the teachers following the classroom observation, as they will be expected to do to in- 
crease the reliability of the results in the actual study. 


Completed LBD observation instruments and scoring rubrics were collected by Dr. Hender- 
son from the classroom observers following their classroom visits. Copies of the completed 
data collection instruments were provided to CNA for independent analysis. 


Summary of Major Findings 


Overall, no major concerns were identified with the use of the rubric in the pilot observa- 
tions. Using the LBD and scoring rubric, 
e Observers were able to distinguish the level of instructional quality among science 
classrooms. 
O 56 percent of the individual items rated on the scoring rubric (N=21) had 
ratings that covered the entire range of possible scores from | to 5. 
oO All individual items had a range of at least two points. 
o Among the 9 subscales, the minimum scores were between 1.0 and 2.7, while 
the maximum scores were between 4.7 and 5.0. 


e Missing data were minimal. 

O For 7 out of 11 of the scales and subscales, none of the 9 observed class- 
rooms had any N/A or missing ratings. 

oO For 3 out of 11 of the scales and subscales, there was 1 observation with a 
N/A or missing rating for | item. 

O For the remaining subscale, “IX. Assesses Learning”, 4 of the items were rat- 
ed as “N/A” or were missing a mark on the LBD for one or more items. 
However, during the pilot, observed teachers were not asked to provide a 
sample assessment for review by the observer. In the actual study teachers will 
be asked in advance to have this information on hand for the observer. 


e Overall ratings were consistent with ratings on subscales. 

0 The overall rating for quality of instruction is not expected to be the average 
of the subscales. Nevertheless, we would expect that teachers who receive 
high overall ratings to tend to have high ratings on each of the subscales. We 
found no anomalies in the ratings. 
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« The average rating for each of the subscales between teachers with 
low overall instructional ratings (score of 2 or 3) and high overall in- 
structional ratings (score of 4 or 5) were 0.7 to 2.9 points higher for 
teachers with high overall ratings. 


e Scores exhibited high face validity 
oO The pilot sample consisted of classroom observations for nine science teach- 
ers, two NBCTs and seven others. Observers were not aware of which teach- 
ers were the NBCTs. Because NBCTs have been certified for the quality of 
their professional practices, we would expect them to score well on the LBD 
and higher, on average, in comparison to teachers who have not gone 
through the NB process. We found that, 

* Both of the National Board teachers had an overall instructional rat- 
ing of 5, the highest possible rating, compared to a mean of 3.3 for 
the Non-National Board teachers. 

« The average subscale ratings for the National Board teachers were al- 
so higher than the Non-National Board teachers’ average rating for 
each of the 9 subscales. 


A second training session for the observers will be conducted before the actual data collec- 
tion to ensure that observers have been refreshed on how to score the rubric and to address 
a few minor issues that were uncovered during the pilot observations, which will be dis- 
cussed in more detail in this report. 


Below we provide more detail on major findings from the pilot study and describe the ob- 
servation process, examine variation in scores, document the extent of missing data and 
items marked N/A, provide context for understanding the overall ratings, examine the in- 
ternal consistency of the ratings, assess the face validity of the results, demonstrate how 
sample results may be displayed in the final report, and discuss the conclusions and impli- 
cations for the study. 


Observation Process 


The observation team for the study consists of seven experienced science educators who 
have been trained in the use of the LBD instrument and have conducted observations in 
actual classroom environments. Though our observers are experienced and well-qualified, 
we provided a full-day training session in October on the NBC-LBD Classroom Observation 
Instrument and the LBD Classroom Observation Rubric. Prior to the start of study observa- 
tions, further training will be provided to address and correct the issues identified in this 
pilot study. 


For these pilot observations, classroom observations were collected from a nonrandom 
sample of nine (9) middle school and high school science teachers in Blount County 
(Maryville), Tennessee and Fayette County (Lexington), Kentucky. Two of these teachers 
had National Board certification. The pilot observations were conducted by five of our 
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trained observers who all have previous experience teaching science and conducting class- 
room observations, as described in the appendix [Table 11]. 


During the classroom observation, the observer filled out the LBD instrument, marking 
items as they were observed. Observers were instructed to mark a response for every item. 
Following each classroom observation, the observer reflected on the observation and, using 
the completed LBD instrument, filled out the LBD Classroom Observation Rubric. Note 
that the LBD acts as a memory device for the observer when filling out the scoring rubric; 
the data collected from the LBD are not used directly in the study. For actual study observa- 
tions, the observer will also obtain planning materials and assessments from the teacher, 
and conduct a short debrief following the observation. These materials and the discussion 
with the teacher will enable the observer to better understand what was observed, facilitat- 
ing more accurate completion of the rubric. 


Each item on the rubric is scored on an integer scale of 1-5, with 5 being the highest rating 
and | the lowest. If an item was not observed, it is marked as “Not Applicable” and is not 
assigned a numeric score. The rubric consists of 9 instruction-related subscales which are 
based on the average rating of 3 to 5 specific items aligned with the LBD instrument. The 
rubric also has an overall quality of instruction rating, and a subscale for the physical set- 
ting. The physical setting rating is collected to provide baseline contextual information and 
is not used to evaluate the teacher or quality of instruction. 


Variation in Scores 


Sufficient variation in scores is needed to distinguish differences in teachers’ instructional 
quality. We examined the distribution of scores for each item and subscale on the rubric. 
Fifty-six percent of the individual items (N=21) had ratings that covered the entire range of 
possible scores from | to 5. All individual items had a range of at least two points. Among 
the 9 subscales, the minimum scores were between 1.0 and 2.7, while the maximum scores 
were between 4.7 and 5.0 (see Table 9). The range of ratings for all subscales was between 
2.3 and 4.0, out of a possible 5-point scale. The distribution of scores was similar for the 
Physical Setting and Overall Rating. These findings indicate variation is present in all of the 
ratings. 


Missing Data and Items Marked “N/A” 


Missing ratings or items marked as “N/A” were excluded from the averages that were calcu- 
lated for each subscale. If too many of these items are excluded from a subscale, then the 
corresponding rating may not be a reliable indicator of the construct it is designed to 
measure. A teacher’s average rating on a subscale may also be disproportionately influ- 
enced by the score on a single item if data are missing for other items in the scale. We 
checked for patterns in missing and “N/A” ratings by examining which items were most 
commonly missing and whether any individual observers reported an unusually high num- 
ber of missing or “N/A” ratings. Any issues identified may indicate a need to revise specific 
items or provide additional training to the observers about how to score them. 
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Table 9: Subscale, physical setting, and overall rating statistics (total number of items, min- 


imum, maximum, range, and average). 


Total # 

Items Minimum Maximum ___ Range __ Average 
I. Lesson Overview 5 2.2 5.0 2.8 4.0 
Il. Instructional Overview 4 1.0 5.0 4.0 3.6 
Il. Questioning 4 2.0 5.0 3.0 3.7 
IV. Classroom Atmosphere 3 2.7 5.0 2.3 4.2 
V. Higher Order Skills 3 2.0 4.7 2.7 3.3 
VI. Content Knowledge 4 2.5 5.0 2.5 3.5 
VII. Positive Climate 5 2.5 5.0 2.5 4.1 
VIII. Implements Instruction 4 2.5 5.0 2.5 3.7 
IX. Assesses Learning 3 15 5.0 3.5 3.4 
Physical Setting 3 137 5.0 3.3 4.1 
Overall Rating 1 2.0 5.0 3.0 3.7 


Table 10 shows the number of teachers who had N/A or missing ratings for 0, 1, 2, or 3 


items for each of the scales and subscales. For 7 out of 11 of the scales and subscales, none 


of the 9 teachers had any N/A or missing ratings. For 3 out of 11 of the scales and sub- 


scales, there was 1 teacher with a N/A or missing rating for 1 item. The remaining subscale, 
“IX. Assesses Learning”, was more problematic, with 4 of the teachers marked as “N/A” or 


missing for one or more items. 


Table 10: Total number of items, and number of teachers with N/A or missing ratings for O, 1, 
2, or 3 items: by scale or subscale 


Total # # Teachers with N/A or missing ratings for: 

Items 0 items litem  2items 3 items 
I. Lesson Overview 5 9 0 0 0 
Il. Instructional Overview 4 9 0 0 0 
Il. Questioning 4 9 0 0 0 
IV. Classroom Atmosphere 3 9 0 0 0 
V. Higher Order Skills 3 8 0 1 0 
VI. Content Knowledge 4 8 1 0 0 
VII. Positive Climate 5 7 2 0 0 
VIII. Implements Instruction 4 9 0 0 0 
IX. Assesses Learning 3 5 1 1 2 
Physical Setting 3 8 1 0 0 
Overall Rating 1 9 0 0 0 


NOTE: A total of 9 teachers were observed. 


However, during the pilot observations the teachers were not asked to provide a sample as- 


sessment in advance, so the observers may have been unable to assign a rating for these 
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items if the teacher did not have an assessment available to review. Before the actual obser- 
vations are conducted the teachers will receive a letter with instructions regarding materials 
they should have available, so we do not anticipate the same problem with N/A and missing 
ratings for this subscale. 


Context for Understanding Overall Ratings 


After rating each of the items on the rubric, observers were asked to assign an overall rating 
of quality of instruction. This rating is designed to take into account the observer’s overall 
impression including the effectiveness of instruction, alignment with objectives and stand- 
ards, student engagement, and development of higher order thinking skills. We examined 
the observer’s written comments and responses on the LBD instrument to provide some 
context for understanding what was scored. Below are two examples of how the classroom 
observations corresponded to the overall instructional ratings. 


Teacher | taught a lesson entitled “What is friction?” to a grade 7 science class. The objec- 
tive of the lesson was to describe how the mass of an object can affect the outcome of colli- 
sions. The students worked in small groups on a lab assignment to study collisions using 
marbles. However, mass was never measured, only judged by the size of a marble. The stu- 
dent investigations focused on basic skills such as “observing” and “inferring” instead of 
higher level skills like “formulating hypotheses” and “interpreting data.” Despite these limi- 
tations, nearly all of the students were engaged, the learning objectives were clearly com- 
municated using multiple means, the teacher communicated effectively with the students, 
and a formative assessment was observed during a closure discussion with the whole class. 
The teacher received an overall instructional rating of 3. 


Teacher 3 taught a lesson on enzymes to a high school biology class. Students worked in 
small groups to investigate how variables affect enzyme activity by designing and perform- 
ing their own experiments. The emphasis in these investigations was on higher-level skills 
such as “evaluating data” and “interpretive discussion.” All students were encouraged to ask 
questions, and the questions stimulated higher level and divergent thinking. The observer 
described the classroom culture as “enthusiasm for learning” and “curiosity.” The teacher 
clearly communicated the learning objectives using multiple means and used formative as- 
sessment that was fully aligned with these learning objectives. The teacher received an over- 
all instructional rating of 5. 


The report will also describe differences in the types of activities observed in the classrooms 
of teachers with high ratings and low ratings. The LBD instrument asks observers to identify 
both the instructional strategies used by the teacher and the activities performed by stu- 
dents during the class. 


Internal Consistency 
The overall rating for quality of instruction is not expected to be the average of the sub- 


scales. For example, suppose a teacher has clear objectives, assigns activities that promote 
higher level skills, asks challenging questions, and demonstrates strong content knowledge, 
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but none of the students are engaged or following the lesson. The teacher would likely re- 
ceive high ratings for many of the subscales except “classroom atmosphere”, so the average 
of the subscale ratings would be relatively high. However, if the students do not appear to 
be learning much from the lesson, the observer may perceive a lower overall quality of in- 
struction. 


Even though there is not an exact match between the overall instructional rating and the 
average of the subscales, we would expect that teachers who receive high overall ratings 
would tend to have high ratings on each of the subscales. In order to examine the internal 
consistency of the ratings, we compared the average rating for each of the subscales be- 
tween teachers with low overall instructional ratings (score of 2 or 3) and high overall in- 
structional ratings (score of 4 or 5). The subscales for teachers with high overall ratings 
were 0.7 to 2.9 points higher compared to teachers with low overall ratings (see Figure 10). 


Figure 10: Average rating on each subscale for teachers with low and high overall instructional 
ratings. 


O Low Overall Rating (2-3) High Overall Rating (4-5) 


4.4 


I. Lesson 
Overview 
ll. Instructional 
Overview 
Ill. Questioning 
IV. Clasroom 
Atmosphere 
V. Higher Order 
ViI.Content 
Knowledge 
VII.Positive 
VIlLImplements 
Instruction 
IX. Assesses 
Learning 


Face Validity 


Face validity is conducted by examining outcomes to consider whether a measure appears 
to assess what it is designed to assess. In the early stages of selecting an instrument for the 
observation, a crosswalk was created to show that many of the same standards used in Na- 
tional Board certification are captured on the LBD instrument. Thus we would expect that 
teachers with National Board certification should score highly on the ratings from the class- 
room observations in this study. 


95 


Two of the nine teachers observed in the pilot observations were National Board certified, 
although the observers did not know of the teachers’ certification status until after the ob- 
servations were conducted. Figure 11 shows how the average of the National Board teach- 
ers’ ratings compared to the mean for Non-National Board teachers, as well as the 
minimum and maximum ratings for the sample. Both of the National Board teachers had 
an overall instructional rating of 5 compared to a mean of 3.3 for the Non-National Board 
teachers. The average subscale ratings for the National Board teachers were also higher 
than the Non-National Board teachers’ average rating for all 9 of the subscales. 


Figure 11: Comparison of ratings for the Non-National Board average, sample mini- 
mum/maximum, and National Board average, by subscale. 


= Non-National Board Average — Sample Min/Max e National Board Average 
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Sample Results 


The ratings from the pilot observations represent observations taken at a single point in 
time, whereas the study will track teachers over time and will include an average of two ob- 
servations per teacher in each time period. The final report will show how the ratings 
changed for teachers with different types of National Board participation. Tests of statistical 
significance will be conducted to determine if there are differences in the change over time 
between teachers with no involvement and teachers with various level of involvement in the 
certification process. 


Figure 12 provides an example of how the information may be displayed graphically for the 
overall rating of instruction. In this hypothetical case, the teachers with no involvement in 
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the National Board process begin with an average rating of 3.0 in year 1, and show little im- 
provement over time with subsequent scores within 0.1 points. Across all National Board 
applicants, there is a 0.4 point increase over time from 3.7 to 4.1. However, when the results 
are disaggregated among different stages of applicants, the increase in scores is largely at- 
tributed to a single group. The teachers who changed from “new applicant to re-applicant 
to certified” demonstrated the greatest growth, with average ratings of 3.5 in year 1, 4.0 in 
year 2, and 4.5 in year 3, for a total change of 1.0 point over three years. The teachers 
whose status changed directly from “applicant to certified” were good teachers in the be- 
ginning of the process and did not change much over time, with average scores between 4.5 
and 4.6 in all years. The teachers whose status changed from “applicant to withdraw” had 
similar ratings to the non-applicant teachers, with a rating of 3.0 in year 1 and 3.1 in years 2 
and 3. 


Figure 12: Sample figure for overall instructional ratings in years 1, 2, and 3; by National Board 
participation status (Hypothetical data). 


O Year 1 @ Year 2 @ Year 3 


Non-Applicant All Applicants} Applicant to Applicant to Applicant to 
Certified Re-Applicant Withdraw 
to Certified 


Conclusion & Implications 


Overall, there were no major issues with the use of the rubric in the pilot observations. The 
ratings revealed variation in scores across teachers, there were no systematic patterns of 
missing data, the observer comments reflected the corresponding ratings, there was inter- 
nal consistency between the overall instructional ratings and the subscale ratings, and face 
validity was established among observed outcomes. 


The research team has identified several changes that should be made before the observa- 
tions for the study are collected. The changes include the following: 
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e Requiring observers to write comments corresponding to the overall rating to 
provide context for understanding why the rating was selected. 

e Reminding the observers to check the consistency between the LBD instrument 
and the scoring rubric. One observer assigned an overall rating of 4 on the in- 
strument and 5 on the rubric for the same teacher. 

e Emphasizing the importance of collecting sample assessments from the teacher 
so that ratings can be assigned to the section on assessment. 

e Changing the instructions on the “Instructional Overview” section of the in- 
strument from “Mark only one” to “Mark all that apply” for the instructional re- 
sources used. Observers may still be asked to distinguish which activities or 
strategies were primarily used, but selecting all that were observed will provide a 
better understanding of what occurred in the lesson 

e Reviewing with the observers how to score sections on the LBD that are not as 
closely aligned with the rubric. For a few of the individual items on the subscales 
(e.g. 1c “Content Delivery”), reviewers checked similar boxes on the LBD but 
there was variation in the corresponding ratings on the rubric. Observers will 
spend more time discussing these items on the rubric that do not directly match 
the LBD so there is a shared understanding about how these items should be 
rated. Observers will also be asked to provide written comments to explain any 
cases where the marks on the LBD appear favorable but the rubric rating is low, 
and vice versa. 


A second training session for the observers will be held prior to data collection for the 
study. At that time, results of the pilot observations will be debriefed and the items listed 
above will be reviewed. Observers also will be asked if they encountered any problems trans- 
ferring the observation data to the rubric and whether any additional data should be col- 
lected on-site to better reflect the classroom observation experience. 


(Appendix) Professional Background of Classroom Observers 
The table below provides a summary of the professional experience of the five observers 
used in this pilot test of the data collection instrument. For the actual data collection, a to- 


tal of seven observers are planned. Two NBCTS are being sought to fill out the observation 
team. 
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Table 11: Science education experience, LBD involvement, and related experience of the 
5 observers who participated in the pilot. 


Science Education Experience 


LBD Involvement 


Related Experience 


14 years science education ex- 
perience -high school physics 
teacher in Tennessee, Pennsylva- 
nia and West Virginia; School/ 
District Consultant for 
Math/Science Partnership Pro- 
jects 


Worked with school dis- 
tricts in Tennessee training 
principals in collection and 
analysis of classroom ob- 
servation data using LBD 
system. 


Adjunct Faculty Member, TN 
postsecondary school— Taught 
physics and physical science 
courses for future teachers; Co- 
ordinator of US DOE funded 
curriculum development project. 


14 years of experience as high 
school science teacher, Ken- 
tucky; Science Content Special- 
ist, KY school district. 


Utilized the LBD program 
for classroom observations 
in KY; utilized LBD data to 
analyze program im- 
provement efforts 


Adjunct Faculty Member, Ken- 
tucky postsecondary school. 
Taught high school science edu- 
cation methods courses 


30 years science education ex- 
perience - biology/physical sci- 
ence teacher in Missouri and 
Tennessee; University professor 
of science education; Director of 
math/science partnership pro- 
jects; Owner/Executive Director 
of science/mathematics program 
improvement consulting firm 


28 Years as high school and 
middle school science teacher 
in, large KY school district 


Utilized the LBD program 
as part of federal program 
development work; trained 
as a Program Improvement 
Profile observer using the 
LBD program; worked with 
school districts in Tennes- 
see training principals in 
collection and analysis of 
classroom observation data 
using LBD system. 


Classroom observer using 
the LBD program for the 
past 12 years; Certified 
Reviewer for the NSTA 
Science Program Im- 
provement Review which 
utilizes the LBD instrument 


Education Partnerships Team/ 
Program Leader for large U.S. 
corporation; Director of federal 
science resource collaborative at 
Univ. of Tennessee; Assistant 
Professor of Science Education, 
TN postsecondary school. 


Regional manager of Partnership 
Reform Initiatives in Science and 
Math — NSF funded project; 
Consultant for a Kentucky High 
School Math Science Partnership 
technology education project; 
Science Education Consultant 
with a regional cooperative 


30 years as an elementary and 
middle grades science teacher in 
two KY school districts 


Classroom observer using 
the LBD program for the 
past 12 years; Certified 
Reviewer for the NSTA 
Science Program Im- 
provement Review which 
utilizes the LBD instrument 


Regional manager of Partnership 
Reform Initiatives in Science and 
Math — NSF funded project; Sci- 
ence Education Consultant for a 
federal initiative focusing on 
school improvement initiatives; 
National Presidential Award for 
Excellence in Science Teaching 
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Appendix D. Construction of the student ana- 


lytic file 
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The statistical analysis of administrative data examines the impact of 
the National Board certification process on teachers’ effectiveness in 
increasing their students’ test scores. Student-level data files were col- 
lected for SYs 2007/08, 2008/09, 2009/10, and 2010/11 from KDE; 
and for SYs 2008/09, 2009/10, 2010/11, and 2011/12 from CPS. 
These student-level data are provided in several different files, in- 
clude school enrollment records, student demographic records, stu- 
dent course transcripts, and student test scores (EXPLORE, PLAN, 
and ACT). The course transcript file includes records for all of the 
courses that each student took in the corresponding year, with the 
teacher of record for each of those courses. 


NBPTS provided a teacher-level file with records on new NBC appli- 
cants in the 2001-2002 application cycle through the 2011-2012 ap- 
plication cycle. The variables in this file include teacher names and 
email addresses, school and district names, cohort, certificate type, 
cycle date, application date, and certification status. We used the 
teacher names, school names, and email addresses to match these 
records to the teachers in the student-level course data from KDE 
and CPS. 


We combined the data described above to create one longitudinal file 
for Kentucky and one for Chicago Public Schools. Students who are 
missing records on the pretest and/or posttest variables were 
dropped from the sample. Each file in Kentucky has multiple records 
per student in each subject that correspond to records for the semes- 
ters from the administration of the pretest through the administra- 
tion of the posttest. The CPS file includes up to two records per 
student, one each for the PLAN and ACT analyses. Records corre- 
sponding to the PLAN analysis include grade 9 classroom teachers in 
the core subject, as well as PLAN and EXPLORE test scores. The rec- 
ords corresponding to the ACT analysis include both grade 10 and 
grade 11 classroom teachers in the core subject areas, as well as ACT 


and PLAN test scores. This allows us to attribute gains in student 
achievement to all of the teachers who taught a student in the time 
from the pretest to the posttest. Standardized state or district course 
codes were used to categorize courses into three subject areas: Eng- 
lish, math, and science. 


Handling students with missing teachers 


One issue we encountered when constructing the data file is that not 
all students take a course in the same subject area for all semesters 
(for all years) between the pretest and the posttest. For these cases, 
we created a new record for the missing periods and assigned a “miss- 
ing” teacher ID so that these students could be included in the anal- 
yses. 


When we examined the records missing teachers more closely, we 
found that one of the reasons this was occurring in Kentucky was be- 
cause some schools are on a block schedule. Block courses meet 
more frequently during the week or have class periods with a longer 
duration than traditional courses in order to allow students to receive 
a full year of credit in a single semester. 


Some 37 percent of students in the Kentucky sample had a block 
course in at least one of the semesters between the pretest and the 
posttest. For these cases, we created a dummy “block” variable in the 
semester that the block course was taken to indicate that the teacher 
was teaching a block course. If no course was taken in the other se- 
mester of the same school year, we created a new record for this se- 
mester with a missing “block” teacher ID. 


There were also some students who took block courses in the same 
subject in both the fall and spring semesters in a single academic 
year. These students experienced twice as much instructional time as 
students in a traditional yearlong course. We created a separate 
dummy variable (“double block”) to indicate that these students 
completed two block courses in the same year. Only 2 percent of stu- 
dents in the Kentucky sample were in this category. 


Handling students with multiple teachers 


Another issue that we encountered when creating the data file was 
that some students have more than one teacher in the same subject 
area in a single semester. One reason this can occur is because stu- 
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dents complete both a core course and an elective course in the same 
subject (e.g., a core English course and a creative writing course). For 
Kentucky, we reviewed the descriptions of the state course codes and 
categorized courses as core if they counted toward the state gradua- 
tion requirements in the subject, or elective otherwise. For CPS, we 
used the descriptions from the CPS graduation requirements. We 
created a dummy variable to indicate whether students had an elec- 
tive course in the same subject area as the core course, and then 
dropped the records for the elective courses. 


There are other reasons why students might have multiple teachers. 
Some students may switch classes or schools in the middle of the se- 
mester. Other students may enroll in more than one core course in 
the same subject area in a single semester. In all of these cases, it is 
difficult to identify to which teacher to attribute changes in student 
outcomes. We created a new record for the semester in which this oc- 
curred, and assigned a missing “multiple” teacher ID variable that in- 
dicates the student had multiple teachers during the semester. 


Finally, in CPS we had some teachers who match to at most five stu- 
dents in the analysis sample. We combine these teachers into a single 
category for purposes of estimating teacher fixed-effect models. 


Table 12 summarizes the number and percentage of observations 
with students assigned to teachers in each of these categories. 


Table 12: Number and percentage of observations with students assigned to “BLOCK,” 
“MISSING,” or “MULTIPLE.” 


BLOCK 3,678 4,091 


MISSING 6,460 56 7,663 £67. 

MULTIPLE 10,350 9.0 ree 5.0 

Total 114,465 114,465 114,465 
“KYPLAN BLOCK O00 | 0 0.0 
[—s—<“‘<«;é‘iISSINNG'”~~*«23,157)~O28.82~SS*«<WHS—“(<‘é‘iSOC*# 

MULTIPLE 6,971 87 3,579 44 

Total 80,490 80,490 
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No Class 92 48 
PLAN 

MISSING 1,052 1.5 1,095 

MULTIPLE 8,332 12.0 10.182 

Small 385 0.6 542 

Total 69,741 69,741 


NOTE: Small reflects teachers with at most five students in the sample. 


1.6 
14.6 
0.8 
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Appendix E. Complete results from all model 
specifications and outcomes 


Table 13: Results for signaling model, mathematics 


Kentucky PLAN (1) (2) (3) (4) (5) 


Effect of having a National effect size 0.122 0.096 0.065 0.056 0.070 
Ra Marre any std. error (0.034 ~—«(0.031 «0.028 ~—s0.024—S «0.018 
scores p-value 0.000 0.002 0.018 0.022 0.000 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming EXPLORE No No Yes Yes Yes 
Observations 80,253 80,253 80,253 80,253 80,253 
Schools 338 
Re 0.51 0.51 0.53 0.53 0.55 
Kentucky ACT (1) (2) (3) (4) (5) 
Effect of having a National effect size 0.099 0.082 0.061 0.056 0.078 
ee any std. error (0.038 ~=—-0.038 ~—s 0.024 —s«0.024.~—«0.009 
scores p-value 0.008 0.030 0.011 0.019 0.000 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
Observations 114,004. 114,004 114,004 114,004 114,004 
Schools 313 
R? 0.66 0.66 0.68 0.68 0.69 
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CPS PLAN (1) (2) (3) (4) (5) 
Effect of having a National Board effect size 0.143 0.029 0.072 0.029 0.004 
certified teaching in 9" grade on std. error 0.049 0.028 ~—S0.038 ~——:0.029-~——0.027 
student PLAN scores p-value 0.003 0.291 0.060 0.318 0.876 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
Observations 69,741 69,741 69,741 69,741 69,741 
Schools 96 
R? 0.58 0.62 0.62 0.63 0.63 
CPS ACT (1) (2) (3) (4) (5) 
Effect of having a National Board effect size 0.205 0.103 0.132 0.098 0.077 
certified teaching in 10" or 11" std.error 0.027. 0.025 = (0.025. (0.024 (0.023 
grade on student ACT scores p-value 0.000 0.000 0.000 0.000 0.001 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
Observations 48,546 48,546 48,546 48,546 48,546 
Schools 95 
R° 0.67 0.72 0.71 0.73 0.73 


NOTES: Student characteristics include age, number of absences (KY only), racial/ethnic 
background (black or Hispanic), gender, free and reduced price lunch eligibility, special 
education and English as a Second Language (ESL) status (KY only), and missing variable 
indicators. School characteristics include school size (in logs), student-teacher ratio, ra- 
cial/ethnic composition of student body (percentage of students who are black, percentage 
of students who are Hispanic), percentage of students eligible for free- and reduced-price 
lunch, student-administrator ratio and per-pupil spending at the district-level, urban-centric 
locale indicator (urban, suburban, rural, or town), and schoollevel average PLAN math, 
English, and science scores. For Kentucky, the teacher experience proxy is the number of 
years the teacher appears in the dataset. Standard errors are clustered by teacher. 
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Table 14: Results for signaling model, English 


Kentucky PLAN (1) (2) (3) (4) (5) 
Effect of having a National Board effect size  -0.004 0.001 -0.010 -0.002 0.000 
certified teaching in any semester. Std. error. (0.025 «(0.026 -0.0200.020-——0.017 
on student PLAN scores p-value 0.859 0.959 0.606 0.937 0.996 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming EXPLORE No No Yes Yes Yes 
Observations 80,263 80,263 80,263 80,263 80,263 
Schools 338 
R? 0.61 0.61 0.62 0.62 0.63 
Kentucky ACT (1) (2) (3) (4) (3) 
Effect of having a National Board effect size 0.076 0.064 0.053 0.028 0.026 
certified teaching in any semester. Std error. (0.032, (0.029 (0.022 (0.019 0.016 
on student ACT scores p-value 0.017 0.027 0.019 0.153 0.098 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
Observations 114,019 114,019 114,019 114,019 114,019 
Schools 313 
R? 0.70 0.70 0.71 0.71 0.71 
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CPS PLAN (1) (2) (3) (4) (5) 
Effect of having a National Board effect size 0.079 0.043 0.055 0.039 0.056 
certified teaching in any semester. Std. error. (0.031 (0.028 0.029 0.027. 0.025 
on student PLAN scores p-value 0.012 0.128 0.055 0.157 0.026 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
observations 69,741 69,741 69,741 69,741 69,741 
schools 96 
R? 0.68 0.70 0.69 0.70 0.70 
CPS ACT (1) (2) (3) (4) (5) 
Effect of having a National Board effect size 0.116 0.052 0.063 0.046 0.062 
certified teaching in any semester Std. error. (0.021 0.015. 0.021. 0.017 0.017 
on student ACT scores p-value 0.000 0.000 0.003 0.006 0.000 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
Observations 48,546 48,546 48,546 48,546 48,546 
Schools 95 
R? 0.71 0.74 0.73 0.74 0.74 


NOTES: Student characteristics include age, number of absences (KY only), racial/ethnic 
background (black or Hispanic), gender, free and reduced price lunch eligibility, special 
education and English as a Second Language (ESL) status (KY only), and missing variable 
indicators. School characteristics include school size (in logs), student-teacher ratio, ra- 
cial/ethnic composition of student body (percentage of students who are black, percentage 
of students who are Hispanic), percentage of students eligible for free- and reduced-price 
lunch, student-administrator ratio and per-pupil spending at the district-level, urban-centric 
locale indicator (urban, suburban, rural, or town), and school-level average PLAN math, 
English, and science scores. For Kentucky, the teacher experience proxy is the number of 
years the teacher appears in the dataset. Standard errors are clustered by teacher. 
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Table 15: Results for signaling model, science 


Kentucky PLAN Gp) (2) (3) (4) (5) 
effect 
Effect of having a National Board size 0.032 0.022 0.008 0.005 -0.015 
certified teaching in any semester Std. error 0.028 0.025 «0.032, (0.027 (0.026 
on student PLAN scores p-value 0.245 0.365 0.807 0.843 0.555 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming EXPLORE No No Yes Yes Yes 
Observations 80,163 80,163 80,163 80,163 80,163 
Schools 338 
R? 0.42 0.43 0.43 0.43 0.44 
Kentucky ACT (1) (2) (3) (4) (5) 
: F effect size 0.040 0.022 0.021 0.006 0.026 
Effect of having a National 
Board cedified teaching inany std. error 0.038 0.042 0.034 0.038 ~—0.030 
semester on student ACT scores p-value 0.291 0.591 0.538 0.866 0.388 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
Observations 113,923 113,923 113,923 113,923 113,923 
Schools 313 
R? 0.49 0.50 0.50 0.51 0.52 
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CPS PLAN (1) (2) (3) (4) (5) 
Ffcebathavine a Navonal Board effect size 0.304 0.028 0.124 0.023 -0.027 
certified teaching in any semester Std. error. (0.074 0.031 0.045 0.029 0.035 
on student PLAN scores p-value 0.000 0.361 0.006 0.427 0.449 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
Observations 69,741 69,741 69,741 69,741 69,741 
Schools 96 
R? 0.49 0.54 0.53 0.55 0.55 
CPS ACT (1) (2) (3) (4) (5) 
Effect of having a National Board effect size 0.190 0.020 0.072 0.013 0.013 
certified teaching in any semester Std. error (0.033. 0.026 0.033 0.023002 
on student ACT scores p-value 0.000 0.432 0.027 0.576 0.545 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
Observations 48,546 48,546 48,546 48,546 48,546 
Schools 95 
R° 0.52 0.57 0.56 0.58 0.58 


NOTES: Student characteristics include age, number of absences (KY only), racial/ethnic 
background (black or Hispanic), gender, free and reduced price lunch eligibility, special 
education and English as a Second Language (ESL) status (KY only), and missing variable 
indicators. School characteristics include school size (in logs), student-teacher ratio, ra- 
cial/ethnic composition of student body (percentage of students who are black, percentage 
of students who are Hispanic), percentage of students eligible for free- and reduced-price 
lunch, student-administrator ratio and per-pupil spending at the district-level, urban-centric 
locale indicator (urban, suburban, rural, or town), and schoollevel average PLAN math, 
English, and science scores. For Kentucky, the teacher experience proxy is the number of 
years the teacher appears in the dataset. Standard errors are clustered by teacher. 
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Table 16: Results for signaling model, all subjects (pooled) 


Kentucky PLAN (1) (2) (3) (4) (5) 
effect 
Effect of having a National Board size 0.042 0.031 0.015 0.013 0.010 
certified teaching in any semester std. error 0.018 0.017 0.014 0.013 0.011 
on student PLAN scores p-value 0.017 0.070 0.274 0.336 0.366 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming EXPLORE No No Yes Yes Yes 
Observations 240,679 240,679 240,679 240,679 240,679 
Schools 338 
R? 0.51 0.51 0.52 0.52 0.53 
Kentucky ACT (1) (2) (3) (4) (3) 
effect 
Effect of having a National Board size 0.071 0.058 0.042 0.034 0.038 
certified teaching in any semester. td. error (0.022, «0.022, «0.015. «0.015(0.012 
on student ACT scores p-value 0.001 0.008 0.005 0.025 0.002 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
Observations 341,946 341,946 341,946 341,946 341,946 
Schools 313 
R? 0.61 0.61 0.62 0.62 0.62 
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CPS PLAN (1) (2) (3) (4) (5) 
Effect of having a National Board effect size 0.167 0.035 0.080 0.030 0.019 
certified teaching in any semester. Std. error. (0.036 = (0.021 «0.029 0.021 0.021 
on student PLAN scores p-value 0.000 0.088 0.005 0.146 0.378 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
observations 209,223 209,223 209,223 209,223 209,223 
schools 96 
R? 0.58 0.62 0.61 0.62 0.62 
CPS ACT (1) (2) (3) (4) (5) 
Effect of having a National Board effect size 0.163 0.062 0.087 0.056 0.054 
certified teaching in any semester. Std. error. (0.017, 0.012, 0.022, 0.012 0.012 
on student ACT scores p-value 0.000 0.000 0.000 0.000 0.000 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
Observations 145,638 145,638 145,638 145,638 145,638 
Schools 95 
R° 0.63 0.67 0.66 0.67 0.67 


NOTES: Student characteristics include age, number of absences (KY only), racial/ethnic 
background (black or Hispanic), gender, free and reduced price lunch eligibility, special 
education and English as a Second Language (ESL) status (KY only), and missing variable 
indicators. School characteristics include school size (in logs), student-teacher ratio, ra- 
cial/ethnic composition of student body (percentage of students who are black, percentage 
of students who are Hispanic), percentage of students eligible for free- and reduced-price 
lunch, student-administrator ratio and per-pupil spending at the district-level, urban-centric 
locale indicator (urban, suburban, rural, or town), and schoollevel average PLAN math, 
English, and science scores. For Kentucky, the teacher experience proxy is the number of 
years the teacher appears in the dataset. Standard errors are clustered by teacher. 
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Table 17: Results for screening model, mathematics 


Kentucky PLAN 


(1) (2) (3) (4) (5) 
Effect of variable for number of semesters effect size 0.071 0.061 0.043 0.037 0.043 
with an ever-certified teacher on PLAN std. error 0.016 0.015 0.013 0.012 0.010 
ans p-value 0.000 0.000 0.001 0.003 0.000 
Effect of variable for number of semesters effect size -0.014 — -0.020 0.001 -0.002 —_-0.010 
with a never-certified teacher on PLAN std.error 0.026 0.028 0.023 0.024 0.021 
ere p-value 0.607 0.470 0.954 0.944 0.637 
Effect of variable for number of semesters _effectsize -0.022 -0.024 -0.023  -0.021 0.014 
with an ever-withdrawn teacher on std.error 0.018 0.018 0.018 — 0.017 0.020 
RAN SCOle® pvalue 0.212 0.180 0.199 0.230 0.488 
effect size 0.085 0.081 0.041 0.039 0.053 
Test: Ever certified - never certified std. error 0.029 0.032 0.022 0.024 0.022 
p-value 0.004 0.012 0.060 0.115 0.014 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming EXPLORE No No Yes Yes Yes 
Observations 80,253 80,253 80,253 80,253 80,253 
Schools 338 
R? 0.51 0.51 0.53 0.53 0.55 
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Kentucky ACT (1) (2) (3) (4) (5) 
Effect of variable for number of semesters effect size 0.053 0.045 0.028 0.026 0.039 
with an ever-certified teacher on ACT std.error 0.014 0.014 0.009 — 0.009 0.008 
Bene? p-value 0.000 0.001 0.001 0.003 0.000 
Effect of variable for number of semesters effect size 0.031 0.015 0.021 0.020 0.003 
with a never-certified teacher on ACT std.error 0.019 0.014 0.009 0.010 0.012 
aSele? p-value 0.105 0.277 0.023 0.050 0.822 
Effect of variable for number of semesters effect size 0.040 0.057 0.041 0.045 0.057 
with an ever-withdrawn teacher on ACT std. error 0.029 0.030 0.026 0.025 0.020 
oe p-value 0.169 0.056 0.107. 0.070 0.005 
effect size 0.022 0.030 0.006 0.005 0.036 
Test: Ever certified - never certified std.error 0.026 0.022 0.014 0.014 0.014 
p-value 0.395 0.175 0.652 0.707 0.011 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
Observations 114,004 114,004 114,004 114,004 114,004 
Schools 313 
R? 0.66 0.66 0.68 0.68 0.69 
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CPS PLAN 


(1) (2) (3) (4) (5) 
Effect of variable for number of years effect size 0.120 0.041 0.062 0.036 0.027 
with an ever-certified teacher on PLAN std. error 0.043 0.024 0.035 0.026 0.026 
Ree p-value 0.005 0.091 0.071 0.163 0.299 
Effect of variable for number of years effect size -0.018 -0.005 0.022 0.013 0.007 
with a never-certified teacher on PLAN std. error 0.036 0.029 0.033 0.029 0.032 
eee ee p-value 0.620 0.856 0.516 0.659 0.835 
Effect of variable for number of years effect size 0.085 0.043 0.058 0.040 0.064 
with an outcome unknown teacher on std. error 0.067 0.055 0.060 0.057 0.050 
PLAN Scores p-value 0.205 0.431 0.335 0.480 0.202 
effect size 0.138 0.046 0.041 0.023 0.021 
Test: Ever certified - never certified std. error 0.052 0.036 0.039 0.035 0.037 
p-value 0.008 0.204 0.297 0.502 0.579 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
Observations 69,741 69,741 69,741 69,741 69,741 
Schools 96 
R? 0.58 0.62 0.62 0.63 0.63 
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CPS ACT (1) (2) (3) (4) (5) 


Effect of variable for number of years effect size 0.169 0.090 0.161 0.081 0.086 
with an ever-certified teacher on ACT std. error 0.020 0.018 0.021 0.022 0.018 
scores 


p-value 0.000 0.000 0.000 0.000 0.000 


Effect of variable for number of years effect size 0.007 0.008 0.005 0.008 0.014 
with a never-certified teacher on ACT std. error 0.025 0.026 0.024 0.026 0.023 
scores 


p-value 0.786 0.764 0.838 0.760 0.545 


Effect of variable for number of years effect size 0.070 0.057 0.067 0.057 0.059 
with an outcome unknown teacher on std. error 0.025 0.015 0.023 0.015 0.014 
ACT scores 


p-value 0.005 0.000 0.004 0.000 0.000 
effect size 0.162 0.082 0.157 0.073 0.072 
Test: Ever certified - never certified std. error 0.031 0.031 0.031 0.034 0.030 
p-value 0.000 0.009 0.000 0.031 0.016 


Additional controls: 


Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
Observations 48,546 48,546 48,546 48,546 48,546 
Schools 95 


NOTES: Student characteristics include age, number of absences (KY only), racial/ethnic 
background (black or Hispanic), gender, free and reduced price lunch eligibility, special 
education and English as a Second Language (ESL) status (KY only), and missing variable 
indicators. School characteristics include school size (in logs), student-teacher ratio, ra- 
cial/ethnic composition of student body (percentage of students who are black, percentage 
of students who are Hispanic), percentage of students eligible for free- and reduced-price 
lunch, student-administrator ratio and per-pupil spending at the district-level, urban-centric 
locale indicator (urban, suburban, rural, or town), and schoollevel average PLAN math, 
English, and science scores. For Kentucky, the teacher experience proxy is the number of 
years the teacher appears in the dataset. Standard errors are clustered by teacher. 
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Table 18: Results for screening model, English 


Kentucky PLAN (1) (2) (3) (4) (5) 
Effect of variable for number of semes- effect size -0.007 -0.002 -0.007 -0.002 0.002 
ters with an ever-certified teacher on std. error 0.012 0.012 0.009 0.009 0.008 
RIAN Scolke p-value 0.590 0.853 0.441 0.808 0.836 
Effect of variable for number of semes- effect size 0.018 0.017 0.015 0.017 0.016 
ters with a never-certified teacher on std. error 0.011 0.011 0.011 0.011 0.009 
BEA NREOIES p-value 0.087 0.136 0.166 0.128 0.069 
Effect of variable for number of semes- effect size  -0.046 -0.036 -0.033 -0.029 -0.019 
ters with an ever-withdrawn teacher on std. error 0.022 0.016 0.020 0.018 0.017 
Hee oCOle p-value 0.039 0.013 0.089 0.108 ~—0.251 
effect size -0.025 -0.019 -0.023 -0.019 -0.014 
Test: Ever certified - never certified std.error 0.016 0.016 0.014 0.014 0.012 
p-value 0.133 0.242 0.112 0.171 0.223 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming EXPLORE No No Yes Yes Yes 
observations 80,263 80,263 80,263 80,263 80,263 
schools 338 
R? 0.61 0.61 0.62 0.62 0.63 
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Kentucky ACT (1) (2) (3) (4) (3) 
Effect of variable for number of semes- effect size 0.024 0.015 0.014 0.003 0.012 
ters with an ever-certified teacher on std. error 0.012 0.011 0.010 0.008 0.007 
ACT scores p-value 0.046 0.190 0.156 0.682 0.105 
Effect of variable for number of semes- _— effect size. ~=—«0.002-—s—s«~-0.007 ~—0.003 -0.006 — -0.001 
ters with a never-certified teacher on std. error 0.018 0.019 0.014 0.014 0.012 
ACT scores p-value 0.905 0.717. 0.824 0.667 0.917 
Effect of variable for number of semes. _— effect size = -0.035. —s -0.026 ~—- -0.034 ~—s-_ -0.026 ~——0.004 
ters with an ever-withdrawn teacher on std. error 0.014 0.013 0.011 0.011 0.017 
ACT-scores p-value 0.012 0.043 0.002 0.024 0.805 
effect size 0.022 0.022 0.011 0.009 0.013 
Test: Ever certified - never certified std. error 0.021 0.022 0.017 0.017 0.015 
p-value 0.296 0.315 0.519 0.568 0.391 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
observations 114,019 114,019 114,019 114,019 114,019 
schools 313 
R? 0.70 0.70 0.71 0.71 0.71 
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CPS PLAN 


(1) (2) (3) (4) (5) 
Effect of variable for number of years effect size 0.094 0.036 0.052 0.030 0.045 
with an ever-certified teacher on PLAN std. error 0.028 0.019 0.026 0.020 0.017 
ore p-value 0.001 0.065 0.046 0.129 0.009 
Effect of variable for number of years effect size 0.027 0.016 0.056 0.029 ~—--0.010 
with a never-certified teacher on PLAN std. error 0.030 0.027 0.029 0.028 0.024 
eee ee p-value 0.372 0.550 0.052 0.288 0.667 
Effect of variable for number of years effect size 0.049 0.014 0.042 0.017 0.024 
with an outcome unknown teacher on std. error 0.032 0.021 0.027 0.020 0.018 
PLAN Scores p-value 0.129 0516 0.112 0399 0.188 
effect size 0.067 0.019  -0.004 0.001 0.056 
Test: Ever certified - never certified std. error 0.036 0.031 0.029 0.030 0.027 
p-value 0.062 0.525 0.899 0.974 0.043 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
Observations 69,741 69,741 69,741 69,741 69,741 
Schools 96 
R? 0.68 0.70 0.69 0.70 0.70 
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CPS ACT (1) (2) (3) (4) (5) 
Effect of variable for number of years effect size 0.082 0.034 0.039 0.027 0.046 
with an ever-certified teacher on ACT std. error 0.016 0.011 0.016 0.012 0.015 
Rete? p-value 0.000 0.002 0.013 0.028 0.002 
Effect of variable for number of years effect size 0.029 0.002 0.038 0.009 0.005 
with a never-certified teacher on ACT std. error 0.036 0.027 0.026 0.025 0.018 
SEOs? p-value 0.428 0.936 0.148 0.725 0.791 
Effect of variable for number of years effect size 0.008 0.012 0.019 0.014 0.005 
with an outcome unknown teacher on std. error 0.019 0.014 0.017 0.014 0.013 
ACT scores p-value 0.662 0.370 0.263 0.317 0.687 
effect size 0.053 0.032 0.001 0.018 0.041 
Test: Ever certified - never certified std. error 0.038 0.027 0.026 0.026 0.021 
p-value 0.157 0.249 0.957 0.477 0.049 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
Observations 48,546 48,546 48,546 48,546 48,546 
Schools 95 
R? 0.71 0.74 0.73 0.74 0.74 


NOTES: Student characteristics include age, number of absences (KY only), racial/ethnic 


background (black or Hispanic), gender, free and reduced price lunch eligibility, special 
education and English as a Second Language (ESL) status (KY only), and missing variable 


indicators. School characteristics include school size (in logs), student-teacher ratio, ra- 


cial/ethnic composition of student body (percentage of students who are black, percentage 
of students who are Hispanic), percentage of students eligible for free- and reduced-price 
lunch, student-administrator ratio and per-pupil spending at the district-level, urban-centric 
locale indicator (urban, suburban, rural, or town), and schoollevel average PLAN math, 


English, and science scores. For Kentucky, the teacher experience proxy is the number of 


years the teacher appears in the dataset. Standard errors are clustered by teacher. 
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Table 19: Results for screening model, science 


Kentucky PLAN (1) (2) (3) (4) (5) 
effect 
Effect of variable for number of semesters size 0.033 0.027 0.023 0.021 0.000 
ee CVEI-CCHIMEATEAENCROMELAN ” tdesior. GUIS: OGTR “GUTS 001s: 0/013 
p-value 0.029 0.052 0.190 0.157 0.999 
effect 
Effect of variable for number of semesters size -0.003 0.002 0.008 0.010 -0.006 
ae NEVEICSIUNegIedenerORRCAN gd enor OO ~O0IO. “OOM OOI2. 6.013 
p-value 0.803 0.883 0.583 0.398 0.643 
effect 
Effect of variable for number of semesters size 0.003 -0.005 0.013 0.008 -0.002 
ee GYET MINGrAWn ACNE OMELAN od can, O27 O08: 7002 ‘0020: wide 
p-value 0.925 0.858 0.530 0.680 0.924 
effect 
size 0.037 0.025 0.016 0.011 0.006 
Pest, Ever centitiods: Ne vebiccmitied std. error 0.024 0.021 0.028 0.023 0.020 
p-value 0.119 0.226 0.573 0.634 0.753 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming EXPLORE No No Yes Yes Yes 
observations 80,163 80,163 80,163 80,163 80,163 
schools 338 
R? 0.42 0.43 0.43 0.43 0.44 
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Kentucky ACT (1) (2) (3) (4) (3) 
effect 
Effect of variable for number of semesters size 0.011 0.005 0.005 -0.001 0.011 
oa SUC Ire lee eaehenomune t std. error 0.014. 0.015. -—-0.014.—s«0.014—s«0.012 
p-value 0.431 0.730 0.696 0.971 0.356 
effect 
Effect of variable for number of semesters size -0.002 0.000 -0.005 -0.003 -0.008 
= never Cerio eacnerionaes std. error 0.022 0.018 0.019 0.015 0.012 
p-value 0.926 0.999 0.797 0.817 0.499 
effect 
Effect of variable for number of semesters size -0.010 -0.066 -0.030 -0.074 -0.039 
pe SVerWiINGre Mn eAGnenOMAGh: --cienae 6024. 201022. “00Ie- <Uo2er 20015 
p-value 0.696 0.003 0.178 0.000 0.044 
effect 
size 0.013 0.005 0.010 0.003 0.019 
Peete bye coring ine von eriiied std. error 0.026 0.023 0.023 0.020 0.017 
p-value 0.606 0.820 0.655 0.886 0.247 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
observations 113,923 113,923 113,923 113,923 113,923 
schools 313 
R? 0.49 0.50 0.50 0.51 0.52 
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CPS PLAN (1) (2) (3) (4) (5) 
Effect of variable for number of years effect size 0.281 0.040 0.109 0.030 -0.010 
with an ever-certified teacher on PLAN std. error 0.063 0.026 0.041 0.025 0.029 
ore p-value 0.000 0.124 0.008 0.243 0.725 
Effect of variable for number of years effect size 0.041 0.011 0.055 0.026 0.020 
with a never-certified teacher on PLAN std. error 0.044 0.028 0.028 0.027 0.034 
eee ee p-value 0.347 ~—0.691 0.047. 0.341 0.547 
Effect of variable for number of years effect size 0.087 0.031 0.034 0.021 0.040 
with an outcome unknown teacher on std. error 0.051 0.022 0.038 0.025 0.022 
PLAN Scores p-value 0.088 0.164 0376 0.364 0.070 

effect size 0.240 0.029 0.054 0.004 -0.030 
Test: Ever certified - never certified std. error 0.073 0.034 0.040 0.032 0.040 
p-value 0.001 0.396 0.180 0.911 0.447 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming test score No No Yes Yes Yes 
Observations 69,741 69,741 69,741 69,741 69,741 
Schools 96 
R? 0.50 0.54 0.53 0.55 0.55 
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CPS ACT (1) (2) (3) (4) (5) 
Effect of variable for number of years effect size 0.148 0.015 0.054 0.009 0.016 
with an ever-certified teacher on ACT std. error 0.024 0.016 0.026 0.015 0.016 
Rete? p-value 0.000 0.347 0.035 0.528 0.315 
Effect of variable for number of years effect size 0.056 -0.008 0.026 -0.009  -0.019 
with a never-certified teacher on ACT std. error 0.026 0.015 0.013 0.014 0.016 
SEOs? p-value 0.034 0.576 0.053 0.548 0.250 
Effect of variable for number of years effect size 0.049 0.046 0.031 0.039 0.050 
with an outcome unknown teacher on std. error 0.026 0.019 0.015 0.019 0.019 
ACT scores p-value 0.058 0.014 0.041 0.040 0.010 
effect size 0.092 0.024 0.029 0.018 0.035 
Test: Ever certified - never certified std. error 0.035 0.021 0.022 0.020 0.023 
p-value 0.009 0.258 0.200 0.365 0.128 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
Observations 48,546 48,546 48,546 48,546 48,546 
Schools 95 
R? 0.52 0.57 0.56 0.58 0.58 


NOTES: Student characteristics include age, number of absences (KY only), racial/ethnic 
background (black or Hispanic), gender, free and reduced price lunch eligibility, special 
education and English as a Second Language (ESL) status (KY only), and missing variable 
indicators. School characteristics include school size (in logs), student-teacher ratio, ra- 
cial/ethnic composition of student body (percentage of students who are black, percentage 
of students who are Hispanic), percentage of students eligible for free- and reduced-price 
lunch, student-administrator ratio and per-pupil spending at the district-level, urban-centric 
locale indicator (urban, suburban, rural, or town), and schoollevel average PLAN math, 


English, and science scores. For Kentucky, the teacher experience proxy is the number of 
years the teacher appears in the dataset. Standard errors are clustered by teacher. 
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Table 20: Results for screening model, all subjects 


Kentucky PLAN (1) (2) (3) (4) (5) 
effect 
Effect of variable for number of semesters size 0.027 0.022 0.014 0.013 0.010 
oo RVEMCCHINGGEEAENSEOMELANG Je ero. 0008. “O00S., 0008. 80.008! 1007 
p-value 0.003 0.015 0.082 0.091 0.152 
effect 
Effect of variable for number of semesters size 0.003 0.003 0.009 0.009 0.003 
He never-cemileg teacneronFhAN!  . gverer:, oops “Ode: - Ao0es ~OOOR. - “0IsO7 
p-value 0.694 0.721 0.233 0.244 0.643 
effect 
Effect of variable for number of semesters size  -0.022 -0.022 -0.013 -0.013 -0.017 
pars everwithidtawniteacnerOn RUAN! tices Gols 0013 “Ole 2001 “O0T2 
p-value 0.096 0.089 0.256 0.253 0.158 
effect 
= - size 0.024 0.019 0.005 0.004 0.007 
Tapia BVeierilied = never Geulniod std. error 0.013. «0.013. —Ss«0.013.—S—«0.012.—S«0.01 
p-value 0.063 0.139 0.692 0.767 0.541 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming EXPLORE No No Yes Yes Yes 
observations 240,679 240,679 240,679 240,679 240,679 
schools 338 
R? 0.51 0.51 0.52 0.52 0.53 
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Kentucky ACT (1) (2) (3) (4) (5) 


effect 
Effect of variable for number of semesters size 0.031 0.022 0.015 0.010 0.015 


with an ever-certified teacher:on:ACT. std. error 0.008 0.008 0.006 0.006 0.005 


scores 
p-value 0.000 0.006 0.013 0.091 0.006 
effect 
Effect of variable for number of semesters size 0.007 -0.002 0.005 -0.002  -0.005 


with a-never-cerifiediteacher on AGI std. error 0.013. ~«-0.011.~Ss«0.010——s«0.010~—=0.009 


scores 
p-value 0.609 0.876 0.645 0.832 0.588 
effect 
Effect of variable for number of semesters size -0.008  -0.008  -0.011 -0.009 0.005 


with an ever-withdrawn teacher on ACT 
scores 


std. error 0.014 0.013 0.012 0.011 0.010 
p-value 0.557 0.549 0.350 0.415 0.627 


effect 
size 0.024 0.024 0.010 0.012 0.020 


std. error 0.015 0.014 0.012 0.012 0.010 
p-value 0.117 0.082 0.401 0.303 0.048 


Test: Ever certified - never certified 


Additional controls: 


Student characteristics Yes Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
observations 341,946 341,946 341,946 341,946 341,946 
schools 313 
R? 0.61 0.61 0.62 0.62 0.62 


126 


CPS PLAN 


(1) (2) (3) (4) (5) 
Effect of variable for number of years effect size 0.161 0.044 0.076 0.036 0.029 
with an ever-certified teacher on PLAN std. error 0.032 0.017 0.029 0.019 0.018 
Ree p-value 0.000 0.010 0.008 0.050 0.112 
Effect of variable for number of years effect size 0.017 0.005 0.045 0.020 0.014 
with a never-certified teacher on PLAN std. error 0.024 0.018 0.023 0.018 0.019 
eee ee p-value 0.487 0.776 ~=— 0.055 —Ss«0~.281 0.475 
Effect of variable for number of years effect size 0.067 0.024 0.042 0.022 0.035 
with an outcome unknown teacher on std. error 0.031 0.020 0.029 0.021 0.020 
PLAN Scores p-value 0.031 0.232. 0.145. ~—-0.286~——(0.076 
effect size 0.144 0.038 0.031 0.016 0.015 
Test: Ever certified - never certified std. error 0.034 0.022 0.023 0.021 0.021 
p-value 0.000 0.082 0.174 0.444 0.468 

Additional controls: 

Student characteristics Yes Yes Yes Yes Yes 

Teacher experience Yes Yes Yes Yes Yes 

School characteristics No Yes No Yes No 

School FE No No No No Yes 

Average incoming test score No No Yes Yes Yes 


Observations 
Schools 
R? 


209,223 209,223 209,223 209,223 209,223 


0.62 


0.61 


0.62 


96 
0.62 


0.58 : ‘ : 
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CPS ACT (1) (2) (3) (4) (5) 
Effect of variable for number of years effect size 0.126 0.050 0.066 0.044 0.046 
with an ever-certified teacher on ACT std. error 0.013 0.009 0.016 0.009 0.009 
Rete? p-value 0.000 0.000 0.000 0.000 0.000 
Effect of variable for number of years effect size 0.034 -0.003 0.024 — -0.001 0.002 
with a never-certified teacher on ACT std. error 0.018 0.012 0.014 0.011 0.011 
SEOs? p-value 0.068 0.783 0.080 0.960 0.868 
Effect of variable for number of years effect size 0.039 0.035 0.036 0.034 0.037 
with an outcome unknown teacher on std. error 0.013 0.009 0.010 0.009 0.009 
ACT scores p-value 0.004 0.000 0.000 0.000 0.000 
effect size 0.093 0.054 0.042 0.045 0.045 
Test: Ever certified - never certified std. error 0.021 0.014 0.014 0.014 0.014 
p-value 0.000 0.000 0.002 0.001 0.001 
Additional controls: 
Student characteristics Yes Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes Yes 
School characteristics No Yes No Yes No 
School FE No No No No Yes 
Average incoming PLAN No No Yes Yes Yes 
Observations 145,638 145,638 145,638 145,638 145,638 
Schools 95 
R? 0.63 0.67 0.66 0.67 0.67 


NOTES: Student characteristics include age, number of absences (KY only), racial/ethnic 


background (black or Hispanic), gender, free and reduced price lunch eligibility, special 


education and English as a Second Language (ESL) status (KY only), and missing variable 


indicators. School characteristics include school size (in logs), student-teacher ratio, ra- 


cial/ethnic composition of student body (percentage of students who are black, percentage 


of students who are Hispanic), percentage of students eligible for free- and reduced-price 


lunch, student-administrator ratio and per-pupil spending at the district-level, urban-centric 


locale indicator (urban, suburban, rural, or town), and school-level average PLAN math, 


English, and science scores. For Kentucky, the teacher experience proxy is the number of 


years the teacher appears in the dataset. Standard errors are clustered by teacher. 
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Table 21: Results for human capital model, all subjects (pooled) 


Kentucky ACT (1) (2) (4) (5) 
Current applicants effect size 0.061 0.043 0.019 0.042 
std. error 0.074 0.060 0.056 0.080 
p-value 0.408 0.473 0.737 0.595 


Past applicants effect size  -0.003 -0.004 -0.018  -0.018 
std. error 0.038 0.042 0.046 0.055 
p-value 0.934 0.922 0.698 0.739 


Additional controls: 


Student characteristics Yes Yes Yes Yes 
Teacher experience proxy Yes Yes Yes Yes 
School characteristics No Yes Yes No 
Teacher FE Yes Yes Yes Yes 
School FE No No No Yes 
Average incoming PLAN No No Yes Yes 
observations 342,462 342,462 342,462 342,462 
schools 313 
teachers 5,438 5,438 5,438 5,438 
R? 0.59 0.60 0.60 0.59 


NOTES: Student covariates include prior test score, demographic variables; model includes 
teacher fixed effects for current teacher and school-level fixed effects. The omitted group is 


future applicants — teachers who have not applied but will in the future. Standard errors are 


clustered by teacher. 
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CPS PLAN (1) (2) (3) (4) 
Current applicants effect size 0.023 0.029 0.019 0.023 
std. error 0.023 0.023 0.023 0.022 
p-value 0.321 0.196 0.427 0.307 
Past applicants effect size 0.017 -0.002 -0.002 -0.008 
std. error 0.025 0.025 0.025 0.025 
p-value 0.511 0.922 0.951 0.751 
Additional controls: 
Student characteristics Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes 
School characteristics No Yes Yes No 
Teacher FE Yes Yes Yes Yes 
School FE No No No Yes 
Average incoming test score No No Yes Yes 
observations 209,223 209,223 209,223 209,223 
schools 99 
teachers 2,360 2,360 2,360 2,360 
R° 0.64 0.64 0.64 0.64 
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CPS ACT 


(1) (2) (3) (4) 
Grade 10 teacher 
Current applicants effect size 0.019 -0.006 0.021 0.004 
std. error 0.021 0.021 0.021 0.022 
p-value 0.371 0.763 0.314 0.855 
Past applicants effect size  -0.012 -0.038 -0.016 -0.026 
std. error 0.027 0.027 0.027 0.026 
p-value 0.648 0.159 0.554 0.312 
Grade 11 teacher 
Current applicants effect size 0.005 -0.005 0.000 0.002 
std. error 0.023 0.024 0.025 0.024 
p-value 0.832 0.839 0.986 0.929 
Past applicants effect size  -0.004 -0.025 -0.017 -0.020 
std. error 0.026 0.025 0.027 0.026 
p-value 0.887 0.324 0.514 0.436 
Additional controls: 
Student characteristics Yes Yes Yes Yes 
Teacher experience Yes Yes Yes Yes 
School characteristics No Yes Yes No 
Teacher FE Yes Yes Yes Yes 
School FE No No No Yes 
Average incoming test score No No Yes Yes 
Observations 143,898 143,898 143,898 143898 
Schools 94 
Teachers 2,856 2,856 2,856 2,856 
R? 0.70 0.70 
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Appendix F. Analysis of ceiling effects on in- 
structional improvement 


In order to determine whether teachers with lower scores for instruc- 
tional quality at baseline grew more between baseline and subsequent 
observations, we created a single vector measuring standardized score 
change on each subscale. Next, we ran a regression model that in- 
cluded a dichotomous variable for whether the teacher was a Nation- 
al Board applicant (1=yes, 0=no), and a series of variables to indicate 
the quartile of the teacher’s rating at baseline. Interaction terms were 
added for the National Board-applicant variable and the quartile var- 
iables to test whether applicants in the bottom quartile of ratings at 
baseline experienced more growth in ratings than did applicants in 
the top quartile of ratings at baseline. We also included control varia- 
bles for each of the subscales and the time point of the observation. 
The model included robust standard errors, clustered on teacher. 


The regression results indicate no statistically significant effect for 
the National Board-applicant variable, or for any of the interaction 
terms between the applicant variable and the quartile of baseline 
performance. 
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Glossary 


CPS Chicago Public Schools district 

EPAS Educational Planning and Assessment System 

ESL English as a Second Language 

FRL free or reduced-price lunch 

IEP Individualized Education Program 

KDE Kentucky Department of Education 

LBD Leadership by Design (classroom observation instrument) 
NAEP National Assessment of Educational Progress 

NBC National Board certification 


NBCT National Board-certified teacher 

NBPTS National Board for Professional Teaching Standards 
NCLB No Child Left Behind 

SY school year 
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